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Abstract 


The  research  findings  of  the  AFOSR  Grant  AFOSR-86-0196,  “Optical  Symbolic  Computing 
Tasks”  are  summarized.  The  grant  period  was  1  June  1986  -  29  November  1989.  Specifically, 
we  hav^  concentrated  on  the  following  topics:  complexity  studies  for  optical  neural  and  digital 
systems,  architecture  and  models  for  optical  computing,  learning  algorithms  for  neural  networks 
and  applications  of  neural  networks  for  early  vision  problems  such  as  image  restoration,  texture 
segmentation,  computation  of  optical  flow  and  stereo.  A  number  of  conference  and  journal  papers 
reporting  the  research  findings  have  been  published.  A  list  of  publications  and  presentations  is 
given  at  the  end  of  the  report  along  with  a  set  of  reprints. 
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1  Complexity  of  Optical  Neural  and  Digital  Systems 

1.1  Digital  Optical  Parallel  System  Complexity 

Our  study  of  digital  optical  system  complexity  has  included  a  comparison  of  optical  and  elec¬ 
tronic  interconnection  network  complexity,  and  a  study  of  design  and  complexity  tradeoffs  for 
the  implementation  of  a  shared  memory  parallel  computer.  The  complexity  of  some  common 
interconnection  networks  have  been  analyzed  for  optical  and  electronic  VLSI  implementations  in 
detail.  The  optical  system  used  for  analysis  was  the  hybrid  2-hologram  interconnection  system 
of  Jenkins,  et  al.  Area  complexity  was  compared  and  found  to  be 


VLSI  OPTICS 


Banyan 

Shuffle/ Exchange 
Hypercube 

2-D  Cellular  Hypercube 


0(7i2)  0  (nlog2n) 

0  (n2/logn)  0  (n2logn) 
0(n2)  0  (nlog2n) 

>  0(n2)  0(n) 


It  should  be  noted  that  the  elect  ronic  results  have  received  a  great  deal  of  work  on  using  variou 
clever  tricks  and  algorithms  to  reduce  the  result  to  near  optimiAri.  The  optics  case  was  only 
investigated  by  us  and  can  likely  be  reduced  further  by  using  different  layouts.  The  Banyan  and 
shuffle/exchange  networks  are  isomorphic  and  for  them,  optics  has  lower  complexity  for  large  n. 
An  example  of  how  the  optical  complexity  can  be  lowered  can  be  seen  in  the  hypercube  network. 
The  2-D  cellular  hypercube  is  identical  to  two  overlapping  hypercube  networks,  so  it  has  twice  as 
many  interconnections,  yet  its  optical  area  complexity  is  much  lower  because  it  is  space-invariant. 


We  have  more  recently  applied  our  expertise  and  results  in  complexity  analysis  to  the  use  of 
optics  in  the  implementation  of  a  parallel  digital  shared  memory  computer.  This  is  described  in 
Section  2.1.  * 


1.2  Connectivity  and  Hierarchical  Neural  Networks 

In  neural  networks  the  connectivity  can  be  very  high  and  in  many  cases  the  nets  are  even  fully 
connected.  As  has  been  shown  by  Psaltis,  even  optical  systems  may  not  be  able  to  provide 
this  much  connectivity  for  nets  with  large  numbers  of  neuron  units  (i.e.,  2-D  arrays  of  neuron 
units).  One  technique  for  optically  reducing  the  physical  interconnection  requirements  is  to  take 
advantage  of  any  symmetry  or  regularity  in  an  interconnection.  Since  neural  nets  are  particu¬ 
larly  useful  for  random  problems,  and  this  may  imply  random  interconnections,  at  first  thought 
utilizing  symmetry  may  not  seem  plausible.  In  the  case  of  vision,  however,  there  is  typically  a 
high  degree  of  regularity  or  symmetry  to  the  interconnection.  In  addition,  even  for  other  ap¬ 
plications,  many  nets  may  have  a  hierarchical  structure,  and  this  can  often  imply  repetition  in 
the  interconnections.  For  example,  a  network  that  utilizes  a  number  representation  scheme  with 
binary  neurons  may  have  the  same  interconnections  repeated  for  each  group  of  neurons  that  rep- 
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resents  a  number.  While  number  representation  techniques  that  have  been  described  for  neural 
networks  do  not  have  this  repetition,  we  have  found  that  variants  of  them  do.  We  have  designed 
such  network  structures  that  have  repeated  blocks,  one  for  each  represented  number,  and  have 
incmporated  proper  update  rules  for  the  neurons  to  ensure  convergence  of  the  net.  This  work 
has  focused  on  single  layer  feedback  networks  used  for  combinatorial  optimization.  This  yields 
a  hierarchical  network  in  the  sense  that  each  block  represents  the  lower  level,  and  the  intercon¬ 
nections  from  block  to  block  represent  the  higher  level.  It  may  also  be  extendable  to  hierarchies 
with  more  than  two  levels. 


* 
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2  Architecture  and  Models  for  Computing 

2.1  Computation  Models  and  Digital  Shared  Memory  Architectures 

We  have  investigated  abstract  models  of  parallel  computation  that  have  been  invented  and  studied 
fairly  extensively  by  the  computer  science  community.  These  models  are:  (1)  inherently  parallel, 
(2)  abstract  and  thereby  divorced  from  the  constraints  of  any  technology,  and  (3)  more  powerful 
than  physically  realizable  parallel  computers.  We  have  found  that  these  models  have  substantial 
implications  in  the  design  of  optical  or  hybrid  optical/electronic  parallel  digital  computers.  Us¬ 
ing  these  models  (instead  of  parallel  electronic  computer  architectures)  as  a  starting  point  in  the 
design  of  parallel  optical  machines  can  potentially  yield  novel  and  more  powerful  architectures 
that  can  take  better  advantage  of  the  characteristics  of  optical  hardware.  It  is  interesting  to  note 
that  the  primary  aspects  of  these  models  that  make  them  impossible  to  implement  physically 
are  the  extremely  parallel  and  powerful  interconnection  network,  and  the  completely  parallel  and 
contention-free  access  to  shared  memory;  both  of  these  aspects  are  potential  advantages  that 
optical  systems  have  over  electronic  systems. 


We  have  proceeded,  as  suggested  above,  to  use  these  models  to  develop  an  initial  design  for 
a  new  architecture  for  parallel  optical/electronic  computing.  It  i/ based  on  shared  memory,  an 
optical  interconnection  network,  and  electronic  processing.  It  can  potentially  use  optics  in  con¬ 
junction  with  appropriate  control  and  routing  techniques  to  achieve  functionality  and  processing 
power  heretofore  unachieved  with  optical  systems.  In  addition,  it  addresses  the  issue  of  what 
kinds  of  parallel  access  shared  memories  may  be  desirable,  and  how  they  might  be  addressed  and 
incorporated  into  an  architecture.  It  is  anticipated  that  this  work  will  continue  after  the  end  of 
this  grant. 


2.2  Incoherent  Optical  Neuron 

A  general  neuron  unit,  used  in  a  neural  network,  must  incorporate  both  positive  and  negative 
(excitatory  and  inhibitory)  inputs.  We  have  invented,  modeled,  simulated,  and  experimentally 
demonstrated  a  new  method  for  incorporating  both  types  of  inputs  in  an  optical  neuron  unit 
that  uses  only  incoherent  optical  devices.  This  incoherent  optical  neuron  (ION)  is  cascadable, 
can  be  used  efficiently  in  partially  as  well  as  fully  connected  neural  networks,  and  can  be  used 
with  either  coherent  or  incoherent  optical  interconnections.  This  work  was  also  supported  in  part 
by  AFOSR  under  the  University  Research  Initiative  program. 
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3  Learning  Algorithms 

3.1  Potential  Difference  Learning 

We  developed  a  new  learning  algorithm,  potential  difference  learning.  It  is  based  on  a  temporal 
difference  of  the  neuron  unit  potential, 


A  W{j  cx  ApiXj 

where  Aie.j  is  the  weight  increment  from  neuron  j  to  neuron  i,  A p,  is  the  temporal  difference  of 
potential,  and  xj  is  the  jth  input  to  neuron  i,  for  self-organization  in  neural  networks.  Depend¬ 
ing  on  the  time  sequence  of  the  input  patterns  during  learning,  it  can  learn  based  on  the  input 
patterns  themselves  or  based  on  the  time  difference  of  input  patterns.  It  has  no  weight  overflow 
as  with  a  strict  Hebbian  law.  It  can,  with  suitable  presentation  of  the  input  patterns,  also  be 
used  to  unlearn  or  erase  stored  states  in  an  associative  memory  without  access  to  the  individual 
weights  and  without  reversing  the  sign  of  the  learning  gain  constant. 


We  have  simulated  potential  difference  learning  on  two  different  networks:  (1)  an  Amari  net¬ 
work,  i.e.  a  single  layer  fully  connected  network  with  feedback,  ufed  as  an  associative  memory, 
and  (2)  a  3-layer  network  used  as  two  associative  memories  with  a  hidden  layer  to  relate  pairs  of 
stored  vectors.  These  are  described  in  a  paper  attached  to  this  report. 


3.2  Stochastic  Learning  Networks  for  Computer  Vision 

We  have  developed  stochastic  learning  networks  for  an  important  problem  in  Computer  Vision, 
viz,  texture  segmentation.  Our  approach  is  based  on  minimizing  an  energy  function,  derived 
through  the  representation  of  textures  as  Markov  Random  Fields  (MRF).  We  use  the  Gauss 
Markov  Random  Field  (GMRF)  to  represent  the  texture  intensities  and  an  Ising  model  to  char¬ 
acterize  the  label  distribution.  We  first  used  an  adaptive  Coheti-Grossberg/Hopfield  network  to 
minimize  the  resulting  energy  function.  The  solution  obtained  is  a  local  optimum  in  general  and 
may  not  be  satisfactory  in  many  cases.  Although  stochastic  algorithms  like  simulated  anneal¬ 
ing  have  a  potential  of  finding  a  global  optimum,  they  are  computationally  expensive.  We  have 
developed  an  alternate  approach  based  on  the  theory  of  learning  automaton  which  introduces 
stochastic  learning  into  the  iterations  of  the  Hopfield  network.  This  approach  consists  of  a  two 
stage  process  with  learning  and  relaxation  alternating  with  each  other  and  because  of  its  stochas¬ 
tic  nature  has  the  potential  of  escaping  the  local  minima. 


The  learning  part  of  the  system  consists  of  a  team  of  automata  As,  one  automaton  for 
each  pixel  site.  Each  automaton  A,  at  site  s  maintains  a  time  varying  probability  vector 
Ps  =  [p4i...,p4£,]  where  p3 jt  is  the  probability  of  assigning  the  texture  class  k  to  the  pixel  site  s. 
Initially  all  these  probabilities  are  equal.  At  the  beginning  of  each  cycle  the  learning  system  will 
choose  a  label  configuration  based  on  this  probability  distribution  and  present  it  to  the  Cohen- 
Grossberg/Hopfield  neural  network  described  above  as  an  initial  state.  The  neural  network  will 
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then  converge  to  a  stable  state.  The  probabilities  for  the  labels  in  the  stable  configuration  are 
increased  according  to  the  following  updating  rule:  Let  ks  be  the  label  selected  for  the  site 
5  =  (*\  j)  in  the  stable  state  in  the  nth  cycle.  Let  A(n)  denote  a  reinforcement  signal  received  by 
the  learning  system  in  that  cycle.  Then, 

Psk,  ( 11  +  0  =  Psk,{n)  +  a\(n)[l  - 
PSj{n)  =  pSj(n)[l  -  a\(n)\,Vj  /  tcs 

for  all  s  =  ( i,j ),  1  <  i,j  <  M . 


In  the  above  equation  ‘a’  determines  the  learning  rate  of  the  system.  The  reinforcement  signal 
determines  whether  the  new  state  is  good  compared  to  the  previous  one  in  terms  of  the  energy 
function.  Using  the  new  probabilities,  a  new  initial  state  is  randomly  generated  for  the  relaxation 
network  and  the  process  repeats.  The  above  learning  rule  is  called  Linear  Reward-Inaction 
rule  in  the  learning  automata  terminology. 


We  have  tested  this  algorithm  in  classifying  some  real  texturedimages.  The  Hopfield  network 
solution  has  a  misclassification  error  of  about  14%  without  learning.  The  error  decreased  to  G.8% 
when  stochastic  learning  was  introduced.  When  simulated  annealing  was  tried  the  error  rate  is 
6.3%,  but  the  number  of  iterations  were  considerably  more.  In  general  stochastic  algorithms  seem 
to  perform  better  than  any  deterministic  scheme. 


Recently,  we  have  extended  our  approach  to  the  unsupervised  case.  In  this  method,  the  image 
is  divided  into  a  number  of  non-overlapping  regions  and  the  GMRF  parameters  are  computed 
from  each  of  these  regions.  A  simple  clustering  scheme  is  used  to  merge  these  regions.  The  pa¬ 
rameters  ot  the  model  estimated  from  the  clustered  segments  are  then  used  in  the  deterministic 
and  stochastic  algorithms  mentioned  earlier.  Details  of  the  unsupervised  texture  segmentation 
results  are  very  encouraging. 


Under  partial  support  from  this  grant  and  the  USC  URI  Center  for  the  Integration  of  Optical 
Computing,  we  have  developed  an  adaptive  neural  network  (NN)  based  algorithm  for  a  funda¬ 
mental  problem  in  image  processing,  viz.,  restoration  of  a  blurred  and  noise  corrupted  image.  In 
this  method,  we  have  used  a  NN  model  to  represent  a  possibly  nonstationary  image  whose  gray 
level  function  is  the  simple  sum  of  neuron  state  variables.  The  restoration  procedure  consists  of 
two  stages:  estimation  of  the  parameters  of  the  NN  model  and  reconstruction  of  images.  During 
the  first  stage,  the  parameters  are  estimated  by  comparing  the  energy  function  of  the  network  to 
a  constrained  error  function,  the  nonlinear  restoration  method  is  then  carried  out  iteratively  in 
the  second  stage  by  using  a  dynamic  algorithm  to  minimize  the  energy  function.  We  have  also 
developed  a  practical  algorithm  with  reduced  computational  complexity.  Comparisons  to  other 
methods  such  as  the  Singular  Value  Decomposition  (SVD)  pseudoinverse  filter,  minimum  mean 
square  error  filter,  and  modified  Minimum  Mean  Square  Error  (MMSE)  filter  using  the  GMRF 


H 


models  showed  that  the  NN  based  method  performed  the  best. 


We  have  extended  these  techniques  for  other  vision  problems  such  as  computation  of  optical 
flow,  static  and  motion  stereo.  The  network  for  computation  of  optical  flow  from  two  or  more 
image  frames  uses  rotation  invariant  features  known  as  principal  curvatures.  A  set  of  layers  of 
binary  neurons  is  used  to  represent  the  flow  field.  Each  neuron  receives  all  inputs  from  itself  and 
other  neurons  in  a  local  neighborhood  in  the  same  layer.  Computation  of  optical  flow  is  carried 
out  by  neuron  evaluation  using  a  parallel  updating  scheme.  Using  information  regarding  the 
occluding  elements,  the  network  automatically  locates  motion  discontinuities.  To  improve  the 
accuracy  of  the  estimated  flow  field,  two  algorithms,  batch  and  recursive  using  multiple  frames 
are  presented.  The  batch  algorithm  integrates  information  from  all  images  simultaneously  by 
embedding  them  into  bias  inputs  of  the  network,  while  the  recursive  algorithm  uses  a  procedure 
to  update  the  bias  inputs  of  the  network.  Satisfactory  results  have  been  obtained  using  these 
methods  on  a  number  of  real  images. 


The  network  for  static  and  motion  stereo  uses  a  similar  approach  using  first  derivatives  es¬ 
timated  by  filtering  discrete  orthogonal  polynomials.  The  recursive. algorithm  for  motion  stereo 
takes  into  account  various  situations  such  as  split  motion,  fusion  motion  in  obtaining  updated 
values  of  disparity. 
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A  Novel  Approach  to  Image  Restoration 
Based  on  a  Neural  Network1 

Y.  T.  Zhou,  R.  Chellappa  and  B.  K.  Jenkins 

Signal  and  Image  Processing  Institute 
Department  of  EE-Systems 
University  of  Southern  California 


Abstract 

A  novel  approach  for  restoration  of  gray  level  images  degraded  by  a  known  shift  invariant  blur 
function  and  additive  noise  is  presented  using  a  neural  computational  model.  A  neural  network  model 
is  employed  to  represent  an  image  whose  gray  level  function  is  the  simple  sum  of  the  neuron  state 
variables.  The  restoration  procedure  consists  of  two  stages:  estimation  of  the  parameters  of  the  neural 
network  model  and  reconstruction  of  images.  During  the  first  stage,  image  noise  is  suppressed  and 
the  parameters  are  estimated.  The  restoration  is  then  carried  out  iteratively  in  the  second  stage  by 
using  a  dynamic  algorithm  to  minimize  the  energy  function  of  an  appropriate  neural  network.  Owing 
to  the  model’s  fault-tolerant  nature  and  computation  capability,  a  higlj  quality  image  is  obtained  using 
this  approach.  A  practical  algorithm  with  reduced  computational  complexity  is  also  presented.  Several 
computer  simulation  examples  involving  synthetic  and  real  images  are  given  to  illustrate  the  usefulness 
of  our  method. 


1  Introduction 

Image  restoration  is  an  important  problem  in  early  vision  processing  to  recover  an  ideal  high  quality 
image  from  a  degraded  recording.  Restoration  techniques  are  applied  to  remove  (1)  system  degradations 
such  as  blur  due  to  optical  system  aberrations,  atmospheric  turbulence,  motion  and  diffraction;  and  (2) 
statistical  degradations  due  to  noise.  Over  the  last  20  years,  various  methods  such  as  the  inverse  filter, 
Wiener  filter,  Kalman  filter,  SVD  pseudoinverse  and  many  other  model  based  approaches,  have  been 
proposed  for  image  restoration.  One  of  the  major  drawbacks  of  most  of  the  image  restoration  algorithms 
is  the  computational  complexity,  so  much  so  that  many  simplifing  assumptions  have  been  made  to  obtain 
computationally  feasible  algorithms.  An  artificial  neural  network  system  that  can  perform  extremely 
rapid  parallel  computation  seems  to  be  very  attractive  for  image  processing  applications;  preliminary 
investigations  to  various  problems  such  as  pattern  recognition  and  image  processing  are  very  promising 
[1] 

In  this  paper,  we  use  a  neural  network  model  containing  redundant  neurons  to  restore  gray  level 
images  degraded  by  a  known  shift  invariant  blur  function  and  noise.  It  is  based  on  the  model  described 
in  [2]  [3]  using  a  simple  sum  number  representation  (4].  The  image  gray  levels  are  represented  by  the 
simple  sum  of  the  neuron  state  variables  which  take  binary  values  of  1  or  0.  The  observed  image  is 
degraded  by  a  shift-invariant  function  and  noise.  The  restoration  procedure  consists  of  two  stages: 
estimation  of  the  parameters  of  the  neural  network  model,  and  reconstruction  of  images.  During  the  first 
stage,  the  image  noise  is  suppressed  and  the  parameters  are  estimated.  The  restoration  is  then  carried 
out  by  using  a  dynamic  iterative  algorithm  to  minimize  the  energy  function  of  the  neural  network.  Owing 
to  the  model’s  fault-tolerant  nature  and  computation  capability,  a  high  quality  image  is  obtained  using 
our  approach.  We  illustrate  the  usefulness  of  this  approach  by  using  both  synthetic  and  real  images 
degraded  by  a  known  shift-invariant  blur  function  with  or  without  noise. 

1  This  research  work  is  partially  supported  by  the  AFOSR  Contract  No.  F-49620-87-C-0007. 
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2  Image  Representation  Using  a  Neural  Network 


We  use  a  neural  network  containing  redundant  neurons  for  representing  the  image  gray  levels.  The  model 
consists  of  L2  x  M  mutually  interconnected  neurons,  where  L  is  the  size  of  image  and  M  is  the  maximum 

value  of  the  gray  level  function.  Let  V  =  {v,  t,  1  <  i  <  L2, 1  <  k  <  M }  be  a  binary  state  set  of  the 
neural  network  with  t u,t  (1  for  firing  and  0  for  resting)  denoting  the  state  of  the  («,  ifc)th  neuron.  Let 
denote  the  strength  (possibly  negative)  of  the  interconnection  between  neuron  (*,  Jb)  and  neuron 
(_/,  /).  We  require  symmetry 

Ti'kj.i  =  7), t;i'k  for  i  <  i,j  <  L 2  and  l  <  l,k  <  M 


We  also  insist  that  the  neurons  have  self-feedback,  i.e.  Titk-,i,k  ^  0-  In  this  model,  each  neuron  (i,  k) 
randomly  and  asynchronously  receives  inputs  from  all  neurons  and  a  bias  input  /,* 


L3  M 

Ui.k  =  EE  (!) 

3  I 

Each  Ui  k  is  fed  back  to  corresponding  neurons  after  thresholding 


w«',t  =  ?(«i,fc) 


where  g(x)  is  a  nonlinear  function  whose  form  cam  be  taken  as 


if  x  >0  4  ' 
if  x  <  0. 


(2) 

(3) 


In  this  model,  the  state  of  each  neuron  is  updated  by  using  the  latest  information  about  other  neurons. 

The  image  is  described  by  a  finite  set  of  gray  level  functions  {*(»,  j),  1  <  *,j  <  L)  with  x(i,j) 
(positive  integer  number)  denoting  the  gray  level  of  the  cell  (i,j).  The  image  gray  level  function  can  be 
represented  by  a  simple  sum  of  the  neuron  state  variables  as 


M 

k= 1 


(4) 


where  m  =  ix  L+j.  Here  the  gray  level  functions  have  degenerate  representations.  Use  of  this  redundant 
number  representation  scheme  yields  advantages  such  as  fault-tolerance  an df  convergence  to  the  solution 
[4], 

If  we  scan  the  2-D  image  by  rows  and  stack  them  as  a  long  vector,  then  the  degraded  image  vector 
can  be  written  as 


Y  =  H2L+N. 


(5) 


where  H  is  the  L2  x  L 2  point  spread  function  (or  blur)  matrix,  and  X_,  Y_  and  TV  are  the  L2  x  I  long 
original,  degraded  and  noise  vectors,  respectively.  This  is  similar  to  the  simultaneous  equations  solution 
of  [4],  but  differs  in  that  (5)  includes  a  noise  term. 

The  shift-invariant  blur  function  can  be  written  as  a  convolution  over  a  small  window,  for  instance, 
it  takes  the  form 


r  i  if  k  =  o,  /  =  o 
l  16  if  1*1.  m  <  1.  (M)  9*  (0,0) 


(6) 


accordingly,  the  “blur  matrix”  H  will  be  a  block  Toeplitz  or  block  circulant  matrix  (if  the  image  has 
periodic  boundaries). 


3  Estimation  of  Model  Parameters 

The  neural  model  parameters,  the  interconnection  strengths  and  the  bias  inputs,  can  be  determined  in 
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terms  of  the  energy  function  of  the  neural  network.  As  defined  in  [2],  the  energy  function  of  the  neural 
network  can  be  written  as 

L 1  L 1  U  M  t*  M 

T'-k^1  Vi-k  +  2  £  Iik  Vi-k  w 

.  =  1  j  =  l  i  =  l  1=1  i  =  l  fc  =  l 

In  order  to  use  the  spontaneous  energy-minimization  process  of  the  neural  network,  we  reformulate  our 
restoration  problem  as  one  of  minimizing  an  energy  function  defined  as 

E=\\\Y-HXtf  (8) 

where  ||Z||  is  the  Li  norm  of  Z_.  By  comparing  the  terms  in  the  expansion  of  (8)  with  the  corresponding 
terms  in  (7),  we  can  determine  the  interconnection  strengths  and  the  bias  inputs  as 

Ti.k  J.I  =  —  53  hpj  (®) 

P=1 

and 

I? 

Ii,k  =  53  Vi'  (1®) 

p=l 

From  (9),  one  can  see  that  the  interconnection  strengths  are  determined  by  the  shift-invariant  blur 
function.  Hence,  can  be  computed  without  error  provided  th^blur  function  is  known.  However, 

the  bias  inputs  are  functions  of  observation,  the  degraded  image.  If  the  image  is  degraded  by  shift- 
invariant  blur  function  only,  then  /,•_*  can  be  estimated  perfectly.  Otherwise,  the  degraded  image  needs 
to  be  preprocessed  to  suppress  the  noise  if  the  signal  to  noise  ratio  (SNR),  defined  by 

2 

SNR=  10log10^-  (11) 

where  <rj  and  are  variances  of  signal  and  noise,  respectively,  is  low. 


4  Restoration 


Restoration  is  carried  out  by  the  neuron  evaluation  and  image  construction  procedure.  Once  the  pa¬ 
rameters  Ti'k  j,i  and  /,  t  are  obtained  using  (9)  and  (10),  each  neuron  can  randomly  and  asynchronously 
evaluate  its  state  and  readjust  accordingly  using  (1)  and  (2).  When  one  quasi-minimum  energy  point  is 
reached,  the  image  can  be  constructed  by  (4). 

However,  this  neural  network  has  self-feedback,  i.e.  ^  0,  as  a  result  of  a  transition  the  energy 

function  E  does  not  decrease  monotonically.  This  is  explained  as  follows.  Define  the  state  change 
of  neuron  (i,  Jb)  and  energy  change  A E  as 

A  Vi,*  =  i ,?lw  -  vfi  and  A  E=  Enew  -  E°,d 


Consider  the  energy  function 


M  M 


E  =  -5  51 H  53  1C  Tw.f  Vi-k  vl>  -  53 


1  =  1  ;=  1  i  =  l  1=1 


1  =  1 


M 

5  '  A'.fc  vi,k< 
t  =  l 


(12) 


Then  the  change  A E  due  to  a  change  Au^i  is  given  by 
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(13) 


L 3  M  j 

A  E  =  ~Cy  ']  'y  ]  Ti'lcj.l  Vj.l  +  A'.t  )  ^fi,l  —  2  (Aft,*:) 

;  =  1  J=1 

which  is  not  always  negative.  For  instance,  if 

L 1  M 

vi!k  -  u‘.t  =  52  XI  ■*•.**>.»  +  i'-k  > 

i=t  i=i 

and  the  threshold  function  is  as  in  (3),  then  =  1  and  At;*  *  >  0.  Thus,  the  first  term  in  (13)  is 
negative.  But 

l? 

Ti.k.i.k  =  ~  hpj  <  0. 

P=  J 

leading  to 

Ti'k,i,k  (Av.jt)2  >  0. 

When  the  first  term  is  less  than  the  second  term  in  (13),  then  A E  >  0  (we  have  observed  this  in  our 
experiments). 

Thus,  depending  on  whether  convergence  to  a  local  minimal  or  a  global  minimal  is  desired,  we  can 
design  a  deterministic  or  stochastic  decision  rule.  The  deterministic  rule  is  to  take  a  new  state  v"lw  of 
neuron  (i,k)  if  the  energy  change  A E  due  to  state  change  At is  less  than  zero.  If  A E  due  to  state 
change  is  >  0,  no  state  change  is  affected.  We  have  also  design^!  a  stochastic  rule  similar  to  the  one 
used  in  simulated  annealing  techniques  [5]  [6].  The  details  of  this  stochastic  scheme  are  given  as  follows: 
Define  a  Boltzmann  distribution  by 

Pnetv  _  rAg 

Pold 

where  pn<w  and  Pou  are  the  probabilities  of  the  new  and  old  global  state,  respectively,  A  E  is  the  energy 
change  and  T  is  the  parameter  which  acts  like  temperature.  A  new  state  v"£”  is  taken  if 

*22»  >  l,  or  *7  —  <  1  ~  >  ( 

Pold  Pold  Pold 

where  (  is  a  random  number  uniformly  distributed  in  the  interval  [0,1],  , 

The  restoration  algorithm  can  then  be  summarized  as 

1.  Set  the  initial  state  of  the  neurons. 

2.  Update  the  state  of  all  neurons  randomly  and  asynchronously  according  to  the  decision  rule. 

3.  Check  the  energy  function;  if  energy  does  not  change  anymore,  go  to  next  step;  otherwise,  go  back 
to  step  2. 

4.  Construct  an  image  using  (4). 


5  A  Practical  Algorithm 

The  algorithm  described  above  is  difficult  to  simulate  on  a  conventional  computer  due  to  high  compu¬ 
tational  complexity  even  for  images  of  reasonable  size.  For  instance,  if  we  have  an  L  x  L  image  with 
M  gray  levels,  then  L7M  neurons  and  interconnections  are  required  and  L*M7  additions  and 

multiplications  are  needed  at  each  iteration.  Therefore,  the  space  and  time  complexities  are  0(L4A/2) 
and  0(LA M2 K),  respectively,  where  K  0(10)-0(100)  is  the  number  of  iterations.  When  L  =  256  and 
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M  =  256,  the  space  and  time  complexities  will  be  0(1O14)  and  0(10ls)-0(1015),  respectively.  However, 
simplification  is  possible  if  the  neurons  are  sequentially  updated  . 

In  order  to  simplify  the  algorithm,  we  begin  by  reconsidering  (1)  and  (2)  of  the  neural  network.  Noting 
that  the  interconnection  strengths  given  in  (9)  are  independent  of  subscripts  k  and  /  and  the  bias  inputs 
given  in  (10)  are  independent  of  subscript  k,  the  M  neurons  used  to  represent  the  same  image  gray  level 
function  have  the  same  interconnection  strengths  and  bias  inputs.  Hence,  one  set  of  interconnection 
strengths  and  one  bias  input  are  sufficient  for  every  gray  level  function,  i.e.  the  dimensions  of  the 
interconnection  matrix  T  and  bias  input  matrix  /  can  be  reduced  by  a  factor  of  M2 .  From  (1)  all  inputs 
received  by  a  neuron,  say,  the  («,  Jfc)th  neuron  can  be  written  as 


L*  M 

M».*  =  E  (E  vj.1)  +  I'. 

i  l 

L1 

~  E  (14) 

i 

where  we  have  used  (4)  and  Xj  is  the  gray  level  function  of  the  jth  image  pixel.  Equation  (14)  suggests 
that  we  can  use  a  multivalue  number  to  replace  the  simple  sum  number.  Since  the  interconnection 
strengths  are  determined  by  the  blur  function  only  as  shown  in  (9),  it  is  easy  to  see  that  if  the  blur 
function  is  local,  then  most  interconnection  strengths  Me  zeros  so  that  the  neurons  are  locally  connected. 
Therefore,  most  elements  of  the  interconnection  matrix  T  are  zeros.  If  the  blur  function  is  shift  invariant 
taking  the  form  in  (6),  then  the  interconnection  matrix  is  block  Toeplitz  so  that  only  a  few  elements  need 
to  be  stored.  Based  on  the  value  of  inputs  u,-,* ,  the  state  of  the  (t,  fc)th  Sjeuron  is  updated  by  applying  a 
decision  rule.  The  state  change  of  the  (i,  k)th  neuron  in  turn  causes  the  gray  level  function  x,-  to  change 

{xfd  if  Av^  =  0 

*<"+1  if  Av,.*  =  1  (15) 

xf*  -  1  if  At,.,*  =  -1 


where  At/,-,*  =  t/"|“  —  vff  is  the  state  change  of  the  (i,  i)th  neuron.  The  supscripts  “new”  and  “old” 
are  for  after  and  before  updating,  respectively.  We  use  x,-  to  respresent  the  gray  level  value  as  well  as 
the  output  of  M  neurons  representing  x*.  Asstuning  that  the  neurons  of  the  network  are  sequentially 
visited,  it  is  straightforward  to  prove  that  the  updating  procedure  can  be  reformulated  as 


=  E  Ti, 


+  A.. 


{Av,-,*  =  0 
A  v.-,*  =  1 
Av,-,*  =  -1 


if  Ui,k  =  0 
if  Vi.k  >  0 
if  Ui.t  <  0 


f  x?d  +  Aw<it  if  AE  <  0 
\  xfd  if  AE>0 


(16) 


(17) 


(18) 


Note  that  the  stochastic  decision  rule  can  also  be  used  in  (18).  In  order  to  limit  the  gray  level  function  to 
the  range  0  to  255,  after  each  updating  step  we  have  to  check  the  value  of  the  gray  level  function  xj,eu'. 
Equations  (16),  (17)  and  (18)  give  a  much  simpler  algorithm.  This  algorithm  is  summarized  below: 


1.  Take  the  degraded  image  as  the  initial  value. 

2.  Sequentially  visit  all  numbers  (image  pixels).  For  each  number,  use  (16),  (17)  and  (18)  to  update 
it  repeatedly  until  no  further  change,  i.e.  if  Av,,*  =  0  or  energy  change  AE  >  0,  then  move  to 
next  one. 


3.  Check  the  energy  function;  if  energy  docs  not  change  anymore,  a  restored  image  is  obtained; 
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otherwise,  go  back  to  step  2  for  another  iteration. 

The  calculations  of  the  inputs  u,t  of  the  (t,  Jfc)th  neuron  and  the  energy  change  A E  can  be  simplified 
furthermore.  When  we  update  the  same  image  gray  level  function  repeatedly,  the  inputs  received  by  the 
current  neuron  (i,k)  can  be  computed  by  making  use  of  the  previous  result 

«.,*  =  «i.t- 1  +  Avi.t  7;..;,..  (19) 

where  u.-.t-i  is  the  inputs  received  by  the  (i,  k  —  l)th  neuron.  The  energy  change  A E  due  to  the  state 
change  of  the  (i,  ifc)th  neuron  can  be  calculated  as 

A£  =  —  u«,t  A Vij  -  -  (A w.,t)2  (20) 

If  the  blur  function  is  shift  invariant,  all  these  simplifications  reduce  the  space  and  time  complexities 
significantly  from  0(L*M2)  and  0(L*M2K)  to  0(L2)  and  0(ML2K),  respectively.  Since  every  gray 
level  function  needs  only  a  few  updating  steps  after  the  first  iteration,  the  computation  at  each  iteration  is 
0(L2).  The  resulting  algorithm  can  be  easily  simulated  on  mini-computers  for  images,  as  large  512  x  512. 


6  Computer  Simulations 

The  practical  algorithm  described  in  the  previous  section  was  applied  to  the  synthetic  and  real  images 
on  a  Sim-3/160  Workstation.  In  all  cases,  only  the  deterministic  decision  rule  was  used.  The  results  are 
summarized  in  Figure  1  and  2. 

Figure  1  shows  the  results  for  the  synthetic  image.  The  original  image  shown  in  Figure  1(a)  is  of 
size  32  x  32  with  3  gray  levels.  The  image  was  degraded  by  convolving  with  a  3  x  3  blur  function  as  in 
(6)  using  a  circulant  boundary  condition;  22  dB  white  Gaussian  noise  was  added  after  convolution.  A 
perfect  image  was  obtained  after  6  iterations  without  preprocessing.  We  set  the  state  of  all  neurons  to 
equal  1,  i.e.  firing  as  initial  condition. 

Figure  2(a)  shows  the  original  girl  image.  The  original  image  is  of  size  256  x  256  with  256  gray  levels. 
The  variance  of  the  original  image  is  2826.128.  It  was  degraded  by  a  5  x  5  uniform  blur  function.  A 
small  amount  of  quantization  noise  was  introduced  by  quantizing  the  convolution  results  to  8  bits.  The 
noisy  blurred  image  is  shown  in  Figure  2(b).  For  comparison  purpose,  Figure  2(c)  shows  the  output  of 
an  inverse  filter  [7],  completely  overridden  by  the  amplified  noise  and  the  ringing  effects  due  to  the  ill 
conditioned  of  the  blur  matrix  H.  Since  the  blur  matrix  H  corresponding  to  the  5x5  uniform  blur 
function  is  not  singular,  the  pseudoinverse  filter  [7]  and  the  inverse  filter  have  the  same  output.  The 
restored  image  by  using  our  approach  is  shown  in  Figure  2(d).  In  order  to  eliminate  the  ringing  effect, 
due  to  the  boundary  conditions,  we  took  the  4  pixel  wide  boundaries  from  the  original  image  and  updated 
the  interior  region  (248  x  248)  of  the  image  only.  The  blurred  imgage  was  used  as  an  initial  condition 
for  accelerating  the  convergence.  The  total  number  of  iterations  was  213  (when  the  energy  function  did 
not  change  anymore).  The  square  error  (i.e.  energy  function)  defined  in  (8)  is  0.02543  and  the  square 
error  between  the  original  and  restored  imges  is  66.5027. 


7  Conclusion 

This  paper  lias  introduced  a  novel  approach  to  restore  gray  level  images  degraded  by  a  shift  invariant 
blur  function  and  additive  noise.  The  restoration  procedure  consists  of  two  steps:  parameter  estimation 
and  image  reconstruction.  In  order  to  reduce  the  computational  complexity,  a  practical  algorithm  which 
is  equivalent  to  the  original  one  is  developed  under  the  assumption  that  the  neurons  are  sequentially 
visited.  The  image  is  generated  iteratively  by  updating  the  neurons  representing  the  image  gray  levels 
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via  a  simple  sum  scheme.  As  no  matrices  are  inverted,  the  serious  problem  of  ringing  due  to  the  ill 
conditioned  blur  matrix  H  and  noise  overriding  caused  by  inverse  filter  or  pseudoinverse  inverse  filter 
can  be  avoided.  For  the  case  of  2-D  uniform  blur  plus  noise,  the  neural  network  based  approach  give 
hi  ;h  quality  images  whereas  the  inverse  filter  and  pseudoinverse  filter  yield  poor  results.  We  see  from  the 
experimental  results  that  the  error  defined  by  (8)  is  small  while  the  error  between  the  original  image  and 
the  restored  image  is  relatively  large.  This  is  because  the  neural  network  decreases  energy  according  to 
(8)  only.  Another  reason  is  that  when  the  blur  matrix  is  singular  or  near  singular,  the  mapping  from  X_ 
to  y  is  not  one  to  one,  therefore,  the  error  measure  (8)  is  not  reliable  anymore.  Thus,  we  have  to  point 
out  that  our  approach  will  not  work  very  well  when  the  bluuring  matrix  is  singular.  In  our  experiments, 
when  the  window  size  of  a  uniform  blur  function  is  3  x  3,  the  ringing  effect  was  eliminated  by  leaving 
the  boundaries  of  the  degraded  image  without  processing.  When  the  window  size  is  5  x  5,  the  ringing 
effect  was  significantly  reduced  by  using  the  original  image  boundaries. 
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(a)  Original  image. 


(b)  Degraded  image. 


(c)  Results  after  6  iterations. 


Figure  1:  Restoration  of  noisy  blurred  synthetic  image. 
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(a)  Original  girl  image. 


(b)  Image  degraded  by  5  x  5 
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(c)  Rsetored  image  using 
inverse  filter. 


(d)  Restored  image  using  our 
approach. 


Figure  2:  Restoration  of  noisy  blurred  real  image  and  comparison. 
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ABSTRACT 

A  learning  algorithm  based  on  temporal  difference  of  membrane  potential  of  the  neuron  is  proposed  for 
self-organizing  neural  networks.  It  is  independent  of  the  neuron  nonlinearity,  so  it  can  be  applied  to  analog  or 
binary  neurons.  Two  simulations  for  learning  of  weights  are  presented;  a  single  layer  fully-connected  network 
and  a  3-layer  network  with  hidden  units  for  a  distributed  semantic  network.  The  results  demonstrate  that  this 
potential  difference  learning  (PDL)  can  be  used  with  neural  architectures  for  various  applications.  Unlearning 
based  on  PDL  for  the  single  layer  network  is  also  discussed.  Finally,  an  optical  implementation  of  PDL  is 
proposed. 

1.  INTRODUCTION 

Most  of  the  unsupervised  learning  algorithms  are  based  on  Hebb’s  hypothesis  [1],  which  depends  on  the 
correlated  activity  of  the  pre-  and  postsynaptic  nerve  cells.  For  stead^ input  patterns,  Hebb’s  rule  will  suffer 
from  weight  overflow.  Von  der  Malsburg  (1973)  (2)  solved  this  by  adding  the  constraint  that  the  sum  of  the 
weights  of  a  neuron  is  constant.  This  concept  led  to  competitive  learning,  developed  by  Grossberg  (1976)  [3], 
and  Rumelhart  and  Zipser  (1985)  (4).  They  also  assumed  a  winner-take-all  algorithm  (Fukushima  1975  (5))  to 
enhance  the  synaptic  weight  modification  between  neurons.  Biologically,  the  sum  of  the  weights  of  a  neuron 
can  likely  be  changed  by  the  supply  of  some  chemical  substance.  In  this  paper  we  propose  a  learning  algorithm, 
potential  difference  learning  (PDL),  based  on  temporal  difference  of  the  neuron  membrane  potential.  Because 
PDL  is  based  on  the  membrane  potential,  it  is  independent  of  the  nonlinear  threshold  function  of  the  neuron.  Its 
temporal  characteristic  prevents  weight  overflow  and  permits  unlearning  without  access  to  individual  weights. 

In  an  artificial  neural  system,  unlearning  can  provide  for  real  time  reprogramming  and  modification  of 
the  distributed  storage  for  stable  recollection,  or  equivalently,  modification  of  the  energy  surface  in  an  energy 
minimization  problem.  Hopfield  proposed  unlearning  to  reduce  the  accessibility  of  spurious  states  [6].  Our 
unlearning  emphasizes  reprogrammability  and  local  modification  of  the  energy  surface  for  stable  partial  retrieval. 
The  unlearning  in  PDL  is  done  by  presenting  a  sequence  of  patterns  and  global  gain  control;  reversing  the  sign 
of  the  learning  gain  is  not  necessary.  The  distinction  of  learning  and  unlearning  in  PDL  is  in  the  data  sequence 
and  value  of  the  gain  constant  for  different  phases. 

The  main  advantages  of  potential  difference  learning  are  spontaneous  learning  without  weight  overflow 
for  steady  state  input  patterns  and  unlearning.  Other  features  of  PDL  include  contrast  learning,  temporally 
correlated  and  uncorrelated  learning,  learning  independently  of  neuron  type  and  ease  of  physical  implementation. 

2.  POTENTIAL  DIFFERENCE  LEARNING  aND  ITS  PROPERTIES 

Like  most  learning  rules,  potential  difference  learning  requires  only  local  information  for  synapse  modifica¬ 
tion.  Given  a  neuron  with  n  inputs,  PDL  is  given  by: 

w(k  +  1)  =  +  Ap(t)  r(*)j  (1) 

±p(k)  =  wT{k)x(k)  -  wr(k  -  1  )x{k  -  1)  (2) 

y(k)  =  *[wT(k)z(k)  -  e{k)}  (3) 

•  Presented  at  SPIE’s  O-E  Lase  '88,  Los  Angeles,  California,  10-15  January,  1988.  ’Neural  Network  Models  for 
Optical  Computing",  SP1E  vol.  882. 
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Where  w(k)  and  r(£)  are  the  input  weights  and  stimuli  respectively;  they  are  represented  as  n  x  1  vectors. 
p(k)  and  y(k)  are  the  neuron  potential  and  output  value  at  time  instant  k.  9(k)  is  thi  threshold  of  the  neuron 
and  Kaa~l(k)  denotes  the  learning  gain  constant  with  Ka  as  the  global  gain  constant  and  a-1(£)  as  the  adaptive 
gain  constant.  The  weight  w(k)  is  bounded  by  the  function  $(•),  which  represents  the  physical  limitation  of 
synapses.  Distinct  from  other  learning  models,  PDL  is  independent  of  the  output  nonlinear  function  <&(•)  of  the 
neuron. 

PDL  has  the  following  properties: 

•  (1).  Self-organization:  similar  to  Hebb’s  rule,  PDL  can  modify  the  weights  of  synapses  according  to  the 
input  patterns. 

•  (2).  Contrast  learning:  Weight  modification  is  initiated  by  potential  difference,  which  is  caused  by 
difference  in  input.  Since  most  sensory  preprocessing  is  differential  in  nature,  PDL  can  provide  a  good 
approach  for  feature  extraction  when  it  is  combined  with  neural  architectures. 

•  (3).  Unlearning:  This  can  be  used  to  erase  stored  states,  to  alter  the  energy  surface  or  reprogram  a 
network.  PDL  can  provide  unlearning  capability  by  applying  suitable  pattern  sequences  to  generate  a 
negative  potential  difference  A p.  This  is  discussed  below. 

•  (4).  Temporally  correlated  or  uncorrelated  learning:  By  varying  the  training  sequences,  the  neuron 
can  learn  absolute  patterns  or  temporal  differences  between  training  patterns.  The  temporal  difference 
property  may  be  useful  in  sensory  information  processing. 

•  (5).  Ease  of  implementation:  The  PDL  uses  only  local  information  to  update  the  weights  and  only  one 
differencer  per  neuron  is  needed  to  calculate  the  potential  difference.  The  complexity  is  low  when  it  is 
compared  with  differential  Hebbian  learning  (Kosko  1986)  [8]  or  drive-reinforcement  learning  (Klopf  1986) 
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•  (6).  Independence  from  neuron  nonlinearity:  The  learning  rule  is  evoked  by  the  potential  change  only,  so 

various  non-linear  functions  can  be  imposed  on  the  neuron  to  make  different  types  of  neurons.  Due  to 
this  feature,  weight  modification  can  still  occur  when  the  output  is  saturated  or  clamped  as  long  as  the 
weight  is  not  saturated.  t 

A  variant  of  PDL  is  given  by 

w(k  +  1)  =  <*[uf(i)  +  Kaa~l(k)  ■  A y(Jfc)  ■  r(i)]  (4) 

which  replaces  potential  difference  Ap(fc)  with  output  difference  Ay(fc).  This  can  be  used  when  the  neuron 
potential  is  not  physically  available.  The  tradeoff  is  that  weight  modification  no  longer  occurs  when  the  neuron 
output  is  saturated.  Equation  (4)  is  similar  in  appearance  to  supervised  learning  (Widrow-Hoff  rule),  but  here 
Ay (k)  refers  to  the  temporal  difference  of  the  neuron  output,  instead  of  the  spatial  output  error. 

Other  learning  algorithms  have  been  proposed  based  on  the  following: 

w(k  +  1)  =  +  K„a~i{k)  ■  Ay(fc)  ■  Ai(fc)]  (5) 

with  A  representing  different  forms  of  temporal  difference  [7],  [8],  [9],  The  use  of  Ar(l-)  instead  of  i(k)  and 
more  complex  definitions  of  time  average  in  these  learning  rules  causes  a  higher  implementation  complexity. 

Due  to  the  fact  that  the  PDL  rule  is  embedded  in  the  neurons,  we  need  some  lateral  interconnections  between 
the  neurons  of  the  same  layer  to  enhance  the  competitive  or  cooperative  modification  of  synapses.  One  example 
is  to  use  the  winner-take-all  algorithm  [5]: 

w,(k  +  1)  =  $[u;J.(i)  +  K,a~l(k)  A p,{k)  ■  rt{k)  5(y,-(t))]  (6) 

where  the  subscript  j  denotes  neuron  j,  and  ^(y;(fc)]  =1  if  j,lx  neuron  wins  in  his  neighborhood,  otherwise 
it  is  zero. 


3.  COMPUTER  SIMULATIONS 


First  a  single  layer  fully  interconnected  neural  network,  as  described  by  Amari  (10),  Hopfield  (11)  and  others, 
is  simulated.  Four  input  patterns  (12),  each  20  bits  long,  are  presented  to  the  external  input  of  the  network.  The 
learning  rule  is  our  PDL  with  /fa=0.01  and  a-l(F)=l.  The  neurons  are  binary  with  bipolar  coding  (+1,-1). 
The  weights  are  initially  set  to  zero.  For  each  iteration,  the  four  inputs  were  presented  in  sequence.  Four 
iterations  were  performed.  The  resulting  weight  matrix  is  shown  in  Fig.  1  (a).  PDL  produces  a  near  symmetric 
weight  matrix,  which  is  quite  similar  to  the  result  obtained  using  the  familiar  sum  of  outer  products,  as  shown 
in  Fig.  1  (b).  If  a  partial  input  is  applied  to  the  trained  network,  we  can  get  full  retrieval  after  several  iterations, 
dependent  on  the  hamming  distance  from  the  partial  input. 
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Fig.  1  Comparison  Between  Outer  Product  T(iJ)  Matrix  and  PDL 
(  20  neurons,  4  patterns) 

The  unlearning  procedure  of  this  network  is  divided  into  two  stages.  (1).  Apply  the  data  to  be  erased  to 
the  network  with  low  (zero)  global  gain  constant.  (2).  Use  the  same  gain  constant  as  for  learning.  In  each  step, 
present  the  input  with  one  bit  complemented;  allow  &p  to  decrease  to  zero;  restore  that  bit  and  complement 
the  next  bit  for  the  next  step.  After  all  bits  have  been  complemented,  one  iteration  is  completed.  Starting 
with  the  trained  weights  of  Fig.  1(a),  two  of  the  stored  vectors  (  pattern  #3  and  #4  )  were  erased  using  this 
unlearning  procedure.  We  erase  pattern  #4  in  five  iterations,  then  erase  pattern  #3  in  another  five  iterations. 
After  each  iteration,  we  test  the  convergence  of  the  erased  pattern.  The  resulting  network  would  not  converage 
to  the  erased  states  after  just  three  iterations.  For  five  iterations  of  unlearning,  the  weight  matrix.  Fig.  2(a),  is 
very  close  to  the  original  weight  matrix  that  stored  only  pattern  #1  and  #2  as  shown  in  Fig.  2(b).  To  measure 
the  performance  of  unlearning,  the  resulting  weight  matrix  is  normalized  by  dividing  it  by  a  factor  F,  which  is 

F  =  - -  ,  (7) 

V('-j) 

where  Tu(i,j )  is  the  resulting  weight  matrix  after  unlearning  and  Ti(i,j)  is  the  ideal  weight  matrix.  Then 
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Rg.  3  Unlearning  of  POL  from  M=4  to  M=2  ,  N=20.  Plot  of  similarity  measures  after  each 
iteration.  (5  iterations  total  for  each  unlearned  pattern;  numbered  in  sequence  for 
unlearning  of  pattern  #3).  The  Ideal  expected  results  for  M=3  and  M=2  are  labelled. 


a  similarity  measure,  which  is  defined  as  the  ratio  of  the  matrix  1-norm  [13]  of  these  two  weight  matrices, 
is  applied  to  evaluate  the  performance.  Fig.  3  shows  the  similarity  measure  of  the  weight  matrix  after  each 
iteration  when  unlearning  using  PDL  from  initially  M=4  stored  vectors  to  Nl=2  stored  vectors. 

The  second  example  is  a  3-layer  network  with  two  fully  interconnected  visible  layers  and  a  hidden  layer.  The 
purpose  of  this  network  is  to  do  associative  mapping  between  two  visible  layers  by  using  one  hidden  layer.  The 
visible  layers  are  fully  connected  and  the  interconnections  between  the  visible  layers  and  the  hidden  layer  are 
shown  in  Fig.  4.  The  visible  layers  use  binary  neurons  to  interface  with  the  environment,  while  the  hidden  layer 
uses  analog  neurons.  One  neuron  is  used  to  calculate  the  average  output,  u(fc),  of  the  hidden  layer,  then  the 
competitive  network  of  Fig.5,  which  is  used  in  the  hidden  layer,  reinforces  those  neurons  with  stronger  output. 
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Fig.  5  competitive  interconnections  of  the  hidden  layer. 


The  operation  of  the  hidden  layer  is 


-m  =  (8) 

>  =  > 

y<3)(*  +  1)  =  *{£>S)vi,,(*)  +  £  wgW'W  +  rliyf\k)  -  u(k)]  -  *[«.(*)  -  v!3)(fc)]}  0) 

i=i  <=i 

where  \l>( x )  is  1  for  x  >  1,  and  is  0  for  x  <  0,  else  it  is  r.  Superscripts  denote  the  layer  number  and 
A'l ,  A'; ,  N3  represent  the  number  of  neurons  in  layer  1,  2,  and  3.  u-Jj*  for  /  =  1.2  represents  the  weight  from 
the  i ,h  neuron  of  layer  /  to  the  jth  neuron  of  the  hidden  layer.  Initially,  the  weights  of  the  visible  layers  are  set 


to  zero  and  the  weights  between  the  visible  layers  and  the  hidden  layer  are  set  to  small  random  values.  The 
visible  layers  are  trained  separately  during  the  first  phase,  while  the  learning  gain  of  the  hidden  layer  is  set 
to  zero.  In  the  second  phase,  the  learning  gain  of  the  hidden  layer  and  the  visible  layers  is  nonzero;  we  apply 
the  corresponding  patterns  at  these  visible  layers  to  train  the  hidden  layer  and  the  visible  layers.  After  the 
learning  phases,  applying  a  partial  input  at  the  visible  layer  #1  will  retrieve  the  full  information  at  the  same 
layer  and  associated  data  at  the  other  visible  layer.  We  have  performed  computer  simulation  of  this  network 
with  20  neurons  in  layer  #1,  16  neurons  in  layer  #2,  and  10  neurons  in  the  hidden  layer.  Eight  patterns  were 
stored,  four  into  layer  #1  and  four  into  layer  #2,  and  associations  between  pairs  of  these  patterns  were  learned. 
The  network  randomly  selected  a  set  of  one  or  more  "representation”  neurons  in  the  hidden  layer  to  form  an 
association  between  each  pair  of  patterns.  Some  of  the  sets  of  neurons  for  different  pairs  of  patterns  were 
disjoint,  and  some  were  partially  overlapping.  Table  1  shows  the  hidden  neurons  selected  by  the  network  for 
each  associated  pair  of  patterns.  The  last  column  in  the  table  shows  the  pattern  retrieved  upon  presentation 
of  each  layer  #1  pattern.  Since  each  set  of  representation  neurons  usually  consists  of  multiple  neurons,  some 
fault  tolerance  is  provided.  However,  when  these  sets  overlap  some  interference  can  result  during  retrieval  of 
associated  patterns.  This  imperfect  mapping  results  from  the  "soft”  competitive  network  that  was  used. 


Tablo  1  Simulation  results  of  network  of  Fig.  4  and  Fig.  5. 
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4.  OPTICAL  ARCHITECTURE  OF  PPL 


A  conceptual  diagram  of  an  optical  implementation  of  PDL  is  shown  in  Fig.  6;  it  is  somewhat  similar  to 
Fisher’s  associative  processor  [14].  Two  spatial  light  modulators  (  SLMs  )  are  used,  one  for  storage  of  the  weight 
matrix  and  one  for  generation  of  &w(k).  In  addition,  two  1-D  storage  devices  and  one  1-D  threshold  device  are 
used. 


SI -3  shutter 
M 1  -3  mirrors 
BS1-2  beam  spllttors 
A  Itoratlon  Input 
B  mlcrochannol  SLM 
C  potantial  output 
0  1-0  threshold  array 

E  output 

F  optical  daiay  Una  or  storaga 

O  beam  combiner 
H  potantial  output  p(k)  or_p(k-i) 
I  external  data  Input 
J  rotation  optics 
K  synchronization  controller 

L  SLM 

N  1-0  storaga  SLM  for  x(k) 


Output 


Fig.  6  Conceptual  diagram  of  an  optical  implementation  of  potantial  difference  learning. 


"A”  is  the  input  r(fc)  from  the  previous  iteration,  which  is  expanded  vertically  to  illuminate  the  weight 
storage  ”B”.  The  reflected  output  from  the  microchannel  SLM  ”B”  is  collected  horizontally.  This  represents  a 
vector,  each  component  of  which  is  YlWi>xi-  ^  then  combined  with  external  input  "I"  to  produce  potential 
output  at  position  "C".  The  output  at  ”C”  is  split  into  three.  The  first  one  passes  through  1-D  threshold  device 
”D”  to  generate  outputs  of  the  neurons.  The  second  output  of  ”C”  passes  through  delay  element  F  and  shutter 
S3,  yielding  p(k  -  1)  at  ”G”.  The  third  output  path  from  ”C”  passes  through  shutter  S2  to  yield  p(k)  at  ”G”. 
Only  one  of  the  shutter  arrays  S2  and  S3  can  be  turned  on  at  a  time.  "G”  is  a  beam  combiner  and  its  output, 
either  p(k)  or  p(k  —  1),  reflects  off  mirrors  M4,  M3  and  is  expanded  horizontally  to  illuminate  the  write  side 
of  SLM  "L".  A  beam  with  intensity  x(k)  illuminates  the  read  side  of  SLM  "L"  (which  is  read  in  reflection), 
to  form  outerproduct  p(Jfe)iT(ifc)  or  p(k  —  l)iT(fc)  for  a  1-D  array  of  neurons.  At  the  first  phase,  p(k)xj'(k)  is 
added  to  the  storage  SLM  ”B”.  Then  p(k  —  l)xT(ifc)  is  applied  to  "B",  which  is  operated  in  subtraction  mode 
during  the  second  phase.  These  two  steps  calculate  the  potential  difference  and  update  the  weights  stored  in 
”B”. 

During  retrieval  phase,  partial  input  is  applied  to  external  input  ”1”  and  is  then  passed  through  threshold 
device  ”D”,  rotation  optics  ”J”,  mirror  M5,  Ml  and  beam  splitter  BS1  to  position  "A"  to  perform  vector-matrix 
computation  of  potential.  Part  of  the  iterated  feedback  signal  x(k)  reflects  off  BS1,  M2  and  is  enabled  by  shutter 
SI  to  store  in  1-D  storage  SLM  ”N”,  which  is  used  to  form  the  outerproduct  during  the  learning  phase.  Mirrors 
Ml  and  M5  are  used,  as  shown,  to  implement  feedback  within  a  single  layer  network.  For  a  multilayer  network 
M 1  and  M5  can  be  removed  (or  replaced  with  beamsplitters)  to  send  outputs  to  and  receive  signals  from  other 
layers. 


5.  CONCLUSIONS 

This  PDL  provides  a  number  of  interesting  features  along  with  a  moderate  implementation  complexity.  It  is 
a  general  technique  that  can  be  applied  to  different  neuron  types  and  different  network  models.  Our  simulations 
indicate  that  it  learns  correctly  in  a  variety  of  networks.  We  also  described  an  unlearning  technique  for  the 
case  of  a  fully  connected  network  used  as  an  associative  memory,  which  does  not  require  any  sign  reversal  of 
the  learning  gain  or  any  global  access  to  the  weights.  Applications  of  PDL  include  low  level  processing  such  as 
extraction  of  features.  t 
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Abstract— A  new  approach  for  restoration  of  gray  level  images  de¬ 
graded  by  a  known  shift-invariant  blur  function  and  additive  noise  is 
presented  using  a  neural  computational  network.  A  neural  network 
model  is  employed  to  represent  a  possibly  nonstationary  image  whose 
gray  level  function  is  the  simple  sum  of  the  neuron  state  variables.  The 
restoration  procedure  consists  of  two  stages:  estimation  of  the  param¬ 
eters  of  the  neural  network  model  and  reconstruction  of  images.  Dur¬ 
ing  the  first  stage,  the  parameters  are  estimated  by  comparing  the  en¬ 
ergy  function  of  the  network  to  a  constrained  error  function.  The 
nonlinear  restoration  method  is  then  carried  out  iteratively  in  the  sec¬ 
ond  stage  by  using  a  dynamic  algorithm  to  minimize  the  energy  func¬ 
tion  of  the  network.  Owing  to  the  model’s  fault-tolerant  nature  and 
computation  capability,  a  high-quality  image  is  obtained  using  this  ap¬ 
proach.  A  practical  algorithm  with  reduced  computational  complexity 
is  also  presented.  Several  computer  simulation  examples  involving  syn¬ 
thetic  and  real  images  are  given  to  illustrate  the  usefulness  of  our 
method.  The  choice  of  the  boundary  values  to  reduce  the  ringing  effect 
is  discussed,  and  comparisons  to  other  restoration  methods  such  as  the 
SVD  pseudoinverse  filter,  minimum  mean-square  error  (MMSE)  filter, 
and  modified  MMSE  filter  using  the  Gaussian  Markov  random  field 
model  are  given.  Finally,  a  procedure  for  learning  the  blur  parameters 
from  prototypes  of  original  and  degraded  images  is  outlined. 


I.  Introduction 

RESTORATION  of  a  high-quality  image  from  a  de¬ 
graded  recording  is  an  important  problem  in  early  vi¬ 
sion  processing.  Restoration  techniques  are  applied  to  re¬ 
move  1)  system  degradations  such  as  blur  due  to  optical 
system  aberrations,  atmospheric  turbulence,  motion,  and 
diffraction;  and  2)  statistical  degradations  due  to  noise. 
Over  the  last  20  years,  various  methods  such  as  the  in¬ 
verse  filter  [1],  Wiener  filter  [1],  Kalman  filter  [2],  SVD 
pseudoinverse  [1],  [3],  and  many  other  model-based  ap¬ 
proaches  have  been  proposed  for  image  restorations .  One 
of  the  major  drawbacks  of  most  of  the  image  restoration 
algorithms  is  the  computational  complexity,  so  much  so 
that  many  simplifying  assumptions  such  as  wide  sense 
stationarity  (WSS),  availability  of  second-order  image 
statistics  have  been  made  to  obtain  computationally  fea¬ 
sible  algorithms.  The  inverse  filter  method  works  only  for 
extremely  high  signal-to- noise  ratio  images.  The  Wiener 
filter  is  usually  implemented  only  after  the  wide  sense  sta¬ 
tionary  assumption  has  been  made  for  images.  Further¬ 
more,  knowledpe  of  the  power  spectrum  or  correlation 
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matrix  of  the  undegraded  image  is  required.  Often  times, 
additional  assumptions  regarding  boundary  conditions  are 
made  so  that  fast  orthogonal  transforms  can  be  used.  The 
Kalman  filter  approach  can  be  applied  to  nonstationary 
image,  but  is  computationally  very  intensive.  Similar 
statements  can  be  made  for  the  SVD  pseudoinverse  filter 
method.  Approaches  based  on  noncausal  models  such  as 
the  noncausal  autoregressive  or  Gauss  Markov  random 
field  models  [4],  [5]  also  make  assumptions  such  as  WSS 
and  periodic  boundary  conditions.  It  is  desirable  to  de¬ 
velop  a  restoration  algorithm  that  does  not  make  WSS  as¬ 
sumptions  and  can  be  implemented  in  a  reasonable  time. 
An  artificial  neural  network  system  that  can  perform  ex¬ 
tremely  rapid  computations  seems  to  be  very  attractive  for 
image  restoration  in  particular  and  image  processing  and 
pattern  recognition  [6]  in  general . 

In  this  paper,  we  use  a  neural  network  model  containing 
redundant  neurons  to  restore  gray  level  images  degraded 
by  a  known  shift-invariant  blur  function  and  noise.  It  is 
based  on  the  method  described  in  [7]— [9]  using  a  simple 
sum  number  representation  [10],  The  image  gray  levels 
are  represented  by  the  simple  sum  of  the  neuron  state  vari¬ 
ables  which  take  binary  values  of  1  or  0.  The  observed 
image  is  degraded  by  a  shift-invariant  function  and  noise. 
The  restoration  procedure  consists  of  two  stages:  estima¬ 
tion  of  the  parameters  of  the  neural  network  model  and 
reconstruction  of  images.  Efuring  the  first  stage,  the  pa¬ 
rameters  are„estimated  by  comparing  the  energy  function 
of  the  neural  network  to  the  constrained  error  function. 
The  nonlinear  restoration  algorithm  is  then  implemented 
using  a  dynamic  iterative  algorithm  to  minimize  the  en¬ 
ergy  function  of  the  neural  network.  Owing  to  the  model’s 
fault-tolerant  nature  and  computation  capability,  a  high- 
quality  image  is  obtained  using  this  approach.  In  order  to 
reduce  computational  complexity,  a  practical  algorithm, 
which  has  equivalent  results  to  the  original  one  suggested 
above,  is  developed  under  the  assumption  that  the  neurons 
are  sequentially  visited.  We  illustrate  the  usefulness  of 
this  approach  by  using  both  synthetic  and  real  images  de¬ 
graded  by  a  known  shift-invariant  blur  function  with  or 
without  noise.  We  also  discuss  the  problem  of  choosing 
boundary  values  and  introduce  two  methods  to  reduce  the 
ringing  effect.  Comparisons  to  other  restoration  methods 
such  as  the  SVD  pseudoinverse  filter,  the  minimum  mean- 
square  error  (MMSE)  filter,  and  the  modified  MMSE  fil¬ 
ter  using  a  Gaussian  Markov  random  field  model  are  given 
using  real  images.  The  advantages  of  the  method  devel¬ 
oped  in  this  paper  are:  1)  WSS  assumption  is  not  required 
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for  the  images.  2)  it  can  be  implemented  rapidly,  and  3) 
it  is  fault  tolerant. 

In  the  above,  the  interconnection  strengths  (also  called 
weights)  of  the  neural  network  for  image  restoration  are 
known  from  the  parameters  of  the  image  degradation 
model  and  the  smoothing  constraints.  We  also  consider 
learning  of  the  parameters  for  the  image  degradation 
model  and  formulate  it  as  a  problem  of  computing  the 
parameters  from  samples  of  the  original  and  degraded  im¬ 
ages.  This  is  implemented  as  a  secondary  neural  network. 
A  different  scheme  is  used  to  represent  multilevel  activi¬ 
ties  for  the  parameters;  some  of  its  properties  are  comple¬ 
mentary  to  those  of  the  simple  sum  scheme.  The  learning 
procedure  is  accomplished  by  running  a  greedy  algorithm. 
Some  results  of  learning  the  blur  parameters  are  presented 
using  synthetic  and  real  image  examples. 

The  organization  of  this  paper  is  as  follows.  A  network 
model  containing  redundant  neurons  for  image  represen¬ 
tation  and  the  image  degradation  model  is  given  in  Sec¬ 
tion  II.  A  technique  for  parameter  estimation  is  presented 
in  Section  III.  Image  generation  using  a  dynamic  algo¬ 
rithm  is  described  in  Section  IV.  A  practical  algorithm 
with  reduced  computational  complexity  is  presented  in 
Section  V.  Computer  simulation  results  using  synthetic 
and  real  degraded  images  are  given  in  Section  VI.  Choice 
of  the  boundary  values  is  discussed  in  Section  VII.  Com¬ 
parisons  to  other  methods  are  given  in  Section  VIII.  A 
procedure  for  learning  the  blur  parameters  from  proto¬ 
types  of  original  and  degraded  images  is  outlined  in  Sec¬ 
tion  IX,  and  conclusions  and  remarks  are  included  in  Sec¬ 
tion  X. 

II.  A  Neural  Network  for  Image  Representation 

We  use  a  neural  network  containing  redundant  neurons 
for  representing  the  image  gray  levels.  The  model  con¬ 
sists  of  L2  x  M  mutually  interconnected  neurons  where  L 
is  the  size  of  image  and  M  is  the  maximum  value  of  the 
gray  level  function.  Let  V  =  { vik  where  1  <  i  <  L2,  l 
<  k  <  M  }  be  a  binary  state  set  of  the  neural  network 
with  vi  k  ( 1  for  firing  and  0  for  resting)  denoting  the  state 
of  the  (i,  &)th  neuron.  Let  Tikjl  denote  the  strength  (pos¬ 
sibly  negative)  of  the  interconnection  between  neuron  (i, 
k)  and  neuron  ( j,  l ).  We  require  symmetry: 

T.x.j.i  =  for  1  <  i,j  —  L2  and 

1  <  /,  k  <  M. 

We  also  allow  for  neurons  to  have  self-feedback,  i.e., 
T,  k  t  k  *  0.  In  this  model,  each  neuron  (i,  k)  randomly 
and  asynchronously  receives  inputs  LTik.ltVji  from  all 
neurons  and  a  bias  input  I,  k: 

L!  M 

U,k  =  YiY,  Tjj.jjVjj  +  li  k.  (1) 

)  I 

Each  uik  is  fed  back  to  corresponding  neurons  after 
thresholding: 

(2) 


where  g(x)  is  a  nonlinear  function  whose  form  can  be 
taken  as 


1  ifjr  >  0 

0  if  x  <  0. 


(3) 


In  this  model,  the  state  of  each  neuron  is  updated  by  using 
the  latest  information  about  other  neurons. 

The  image  is  described  by  a  finite  set  of  gray  level  func¬ 
tions  {x(i,j)  where  1  <  i,j  <  L]  with  x(i,j  )  (positive 
integer  number)  denoting  the  gray  level  of  the  pixel  (i, 
j).  The  image  gray  level  function  can  be  represented  by 
a  simple  sum  of  the  neuron  state  variables  as 

x(i,j)  -  2  tv*  (4) 


where  m  =  (i  —  1 )  x  L  +  j.  Here  the  gray  level  functions 
have  degenerate  representations.  Use  of  this  redundant 
number  representation  scheme  yields  advantages  such  as 
fault  tolerance  and  faster  convergence  to  the  solution  [10]. 

By  using  the  lexicographic  notation,  the  image  degra¬ 
dation  model  can  be  written  as 

i  Y  =  HX  +  N  (5) 

where  H  is  the  “blur  matrix”  corresponding  to  a  blur 
function,  N  is  the  signal  independent  white  noise,  and  X 
and  Y are  the  original  and  degraded  images,  respectively. 
Furthermore,  H  and  (Vcan  be  represented  as 


and 


(6) 


respectively.  Vectors  X  and  Y  have  similar  representa¬ 
tions.  Equation  (5)  is  similar  to  the  simultaneous  equa¬ 
tions  solution  of  [  10],  but  differs  in  that  it  includes  a  noise 
term. 

The  shift-invariant  blur  function  can  be  written  as  a 
convolution  over  a  small  window,  for  instance,  it  takes 


»,.k  = 
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the  form 


h(k.l)  = 


'J  if  k  =  0,  /  =  0 
XS  if  |*|.  |/|  s  !.(*,/)  *  (0,0); 


(8) 

accordingly,  the  “blur  matrix”  //will  be  a  block  Toeplitz 
or  block  circulant  matrix  (if  the  image  has  periodic 
boundaries).  The  block  circulant  matrix  corresponding  to 
(8)  can  be  written  as 


H  = 


H0  //,  0 
H{  H0  Ht 

H,  0  0 


where 


H„  = 


•  i  n 

5  T5  u 


0  //, 

0  0 

//,  H0 


0  i* 


(9) 


ik  i  ■  *  '  0  0 


H,  = 


is  o  o 


is  is  0 

i<>  ik  1<> 


TS 


0  is 

0  0 


(10) 


.1^  0  0  '  '  '  r5_ 

and  0  is  null  matrix  whose  elements  are  all  zeros. 

III.  Estimation  of  Model  Parameters 

The  neural  model  parameters,  the  interconnection 
strengths,  and  bias  inputs  can  be  determined  in  terms  of 
the  energy  function  of  the  neural  network.  As  defined  in 
[7],  the  energy  function  of  the  neural  network  can  be  writ¬ 
ten  as 
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larization  techniques  used  in  early  vision  problems  [II], 
The  first  term  in  (12)  is  to  seek  an  X  such  that  HX  ap¬ 
proximates  y  in  a  least  squares  sense.  Meanwhile,  the 
second  term  is  a  smoothness  constraint  on  the  solution  X. 
The  constant  X  determines  their  relative  importance  to 
achieve  both  noise  suppression  and  ringing  reduction. 

In  general,  if  H  is  a  low-pass  distortion,  then  D  is  a 
high-pass  filter.  A  common  choice  of  D  is  a  second-order 
differential  operator  which  can  be  approximated  as  a  local 
window  operator  in  the  2-D  discrete  case.  For  instance, 
if  D  is  a  Laplacian  operator 


a 2  a2 

“  Bi 2  +  dj2 

it  can  be  approximated  as  a  window  operator 
1  4  1 


(13) 


4  -20  4 
1  4  1 


(14) 


Then  D  will  be  a  block  Toeplitz  matrix  similar  to  (9). 
Expanding  (^2)  and  then  replacing  x,  by  (4),  we  have 

O  /  L'  v  2  Lt  /  O  v  2 

L>  O  M  M  O 

=  1  E  E  S  S  S  hpjhp  jVijtVj  i 

I-l  jm  1  k  -  I  /-I  pm  1  y  ' 

O-  O-  M  M  L1 

-MX  E  E  E  E  £  d.  jd„ , V; tVj i 

1  |.|;.|  fl  l>l  ;>l  P ’  PJ  ' 

L 2  M  L1  & 


~  2  £  £  yPhpJv-Kk  +  j  £  y2. 

i*l  «BI  p*l  #  p m I 


(15) 


By  comparing  the  terms  in  (15)  to  the  corresponding  terms 
in  (11)  and  ignoring  the  constant  term  y p.  we  can 

determine  the  interconnection  strengths  and  bias  inputs  as 

Tik,j  =  "p5, hp,hpj  ~  K  P?, dp,dpi 


and 


o  o 

=  -i  E  E 

I  m  |  j  -  | 


M  M 

E  Tj  k'j /Vj  kVj  i 

*  -  I  I  m  I  ' 


O  M 

£  E  Ii.kV,k. 

I-l  k  m  | 


(11) 

In  order  to  use  the  spontaneous  energy-minimization  pro¬ 
cess  of  the  neural  network,  we  reformulate  the  restoration 
problem  as  one  of  minimizing  an  error  function  with  con¬ 
straints  defined  as 


£M||y-//*||2Mx||D*|!2  (12) 

where  ||  Z||  is  the  Lj  norm  of  Z  and  X  is  a  constant.  Such 
a  constrained  error  function  is  widely  used  in  the  image 
restoration  problems  [1]  and  is  also  similar  to  the  regu- 


L- 

I,.k  =  E  yphp.,  (17) 

P“  1 

where  h,  j  and  d,  t  are  the  elements  of  the  matrices  H  and 
D,  respectively.  Two  interesting  aspects  of  (16)  and  (17) 
should  be  pointed  out:  1)  the  interconnection  strengths  are 
independent  of  subscripts  k  and  l  and  the  bias  inputs  are 
independent  of  subscript  k.  and  2)  the  self-connection 
Tt  k  l  k  is  not  equal  to  zero  which  requires  self-feedback 
for  neurons. 

From  (16),  one  :m  see  that  the  interconnection 
strengths  are  determined  by  the  shift-invariant  blur  func¬ 
tion,  differential  operator,  and  constant  X.  Hence,  7)  t  , 
can  be  computed  without  error  provided  the  blur  function 
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is  known.  However,  the  bias  inputs  are  functions  of  the 
observed  degraded  image.  If  the  image  is  degraded  by  a 
shift-invariant  blur  function  only,  then  /,*  can  be  esti¬ 
mated  perfectly.  Otherwise,  /,*  is  affected  by  noise.  The 
reasoning  behind  this  statement  is  as  follows.  By  replac¬ 
ing  yp  by  Efl ,  hp,x,  +  np,  we  have 

°  (  L*  \ 

'-•*  *  (,?,  *'•* + "'h1 

O  o  o 

=  22  hp  ,x,hp  +  2  riphp  h  (18) 

p  *  1  i  *  I  p  m  1 

The  second  term  in  (18)  represents  the  effects  of  noise.  If 
the  signal-to-noise  ratio  (SNR),  defined  by 


SNR  =  10  log  10 (19) 

On 

where  a]  and  a2  are  variances  of  signal  and  noise,  re¬ 
spectively,  is  low,  then  we  have  to  choose  a  large  X  to 
suppress  effects  due  to  noise.  It  seems  that  in  the  absence 
of  noise,  the  parameters  can  be  estimated  perfectly,  en¬ 
suring  exact  recovery  of  the  image  as  error  function  £ 
tends  to  zero.  However,  the  problem  is  not  so  simple  be¬ 
cause  the  restoration  performance  depends  on  both  the  pa¬ 
rameters  and  the  blur  function  when  a  mean-square  error 
or  least  square  error  such  as  (12)  is  used.  A  discussion 
about  the  effect  of  blur  function  is  given  in  Section  X. 

IV.  Restoration 

Restoration  is  carried  out  by  neuron  evaluation  and  an 
image  construction  procedure.  Once  the  parameters 
TiXj.i  and  /,  *  are  obtained  using  (16)  and  (17),  each  neu¬ 
ron  can  randomly  and  asynchronously  evaluate  its  state 
and  readjust  accordingly  using  (1)  and  (2).  When  one 
quasi-minimum  energy  point  is  reached,  the  image  can  be 
constructed  using  (4). 

However,  this  neural  network  has  self-feedback,  i.e., 
Ti.ku.k  *  0.  As  a  result,  the  energy  function  E  does  not 
always  decrease  monotonically  with  a  transition.  This  is 
explained  below.  Define  the  state  change  At/,*  of  neuron 
( /,  k )  and  energy  change  A  £  as 

A.,  _  ..new  old  j  a  r*  _  J7ncw  rold 

Vi  £  —  Vi  £  v j'le  ana  —  c  l 
Consider  the  energy  function 

L1  O  M  M  O  M 

£  =  -  j  2  2  2  2  TikJivikvji  -  2  2  i,  k  vi  k. 


Then  the  change  A  E  due  to  a  change  At/, is  given  by 

A£  =  -  f  2  2(  T,  k.j  ,Vj  i  +  I,.*  Jaiv* 

~  (2! 


which  is  not  always  negative.  For  instance,  if 

LJ  M 

vfi  =  o,  ui  k  =22  Tj'tjjVjj  +  /,,*  >  0 

y  -  I  I  -  I 

and  the  threshold  function  is  as  in  (3),  then  v “w  =  1  and 
At/,  t  >  0.  Thus,  the  first  term  in  (21)  is  negative.  But 
o  L2 

T,k.,k  =  -  2  h2pJ  -  X  2  4,.  <  0 

p- 1  p* i 

with  X  >  0,  leading  to 

~\^i.k;i.k(^vi.k)  >  0. 

When  the  first  term  is  less  than  the  second  term  in  (21), 
then  AE  >  0  (we  have  observed  this  in  our  experiment), 
which  means  £  is  not  a  Lyapunov  function.  Conse¬ 
quently,  the  convergence  of  the  network  is  not  guaranteed 
[12]. 

Thus,  depending  on  whether  convergence  to  a  local 
minimum  or  a  global  minimum  is  desired,  we  can  design 
a  deterministic  or  stochastic  decision  rule.  The  determin¬ 
istic  rule  is  to  take  a  new  state  v^k  of  neuron  (/,  k)  if  the 
energy  change  AE  due  to  state  change  At/,  *  is  less  than 
zero.  If  AE  due  to  state  change  is  >  0,  no  state  change 
is  affected.  One  can  also  design  a  stochastic  rule  similar 
to  the  one  used  in  stimulated  annealing  techniques  [13], 
[14],  The  details  of  this  stochastic  scheme  are  given  as 
follows. 

Define  a  Boltzmann  distribution  by 

Pnew  _  e-\E/T 

Po\i 

where  pnew  and  pM  are  the  probabilities  of  the  new  and 
old  global  state,  respectively,  A£  is  the  energy  change, 
and  T  is  the  parameter  which  acts  like  temperature.  A  new 
state  v^k  is  taken  if 

^  >  1  orif^  <  1  but^  >  $ 

Poid  Pold  Pold 

where  $  is  a  random  number  uniformly  distributed  in  the 
interval  [0,  1]. 

The  restoration  algorithm  is  summarized  as  below. 
Algorithm  1: 

1)  Set  the  initial  state  of  the  neurons. 

2)  Update  the  state  of  all  neurons  randomly  and  asyn¬ 
chronously  according  to  the  decision  rule. 

3)  Check  the  energy  function;  if  energy  does  not 
change,  go  to  step  4);  otherwise,  go  back  to  step  2). 

4)  Construct  an  image  using  (4). 

V.  A  Practical  Algorithm 

The  algorithm  described  above  is  difficult  to  simulate 
on  a  conventional  computerowing  to  high  computational 
complexity,  even  for  images  of  reasonable  size.  For  in¬ 
stance,  if  we  have  an  L  x  L  image  with  M  gray  levels. 


then  L2M  neurons  and  i  L*M2 


interconnections  are  re¬ 


quired  and  L4M2  additions  and  multiplications  are  needed 
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at  each  iteration.  Therefore,  the  space  and  time  complex¬ 
ities  are  0(L*M2)  and  0(LiM2K ),  respectively,  where 
K ,  typically  10-100,  is  the  number  of  iterations.  Usually, 
L  and  M  are  256-1024  and  256,  respectively.  However, 
simplification  is  possible  if  the  neurons  are  sequentially 
updated. 

In  order  to  simplify  the  algorithm,  we  begin  by  recon¬ 
sidering  (1)  and  (2)  of  the  neural  network.  As  noted  ear¬ 
lier,  the  interconnection  strengths  given  in  (16)  are  inde¬ 
pendent  of  subscripts  k  and  /  and  the  bias  inputs  given  in 
(17)  are  independent  of  subscript  k;  the  M  neurons  used 
to  represent  the  same  image  gray  level  function  have  the 
same  interconnection  strengths  and  bias  inputs.  Hence, 
one  set  of  interconnection  strengths  and  one  bias  input  are 
sufficient  for  every  gray  level  function,  i.e.,  the  dimen¬ 
sions  of  the  interconnection  matrix  T  and  bias  input  ma¬ 
trix  /  can  be  reduced  by  a  factor  of  M2.  From  (1),  all 
inputs  received  by  a  neuron,  say  the  (i,  jfc)th  neuron,  can 
be  written  as 

li  /“  \ 

“•*“?  r- -  'j 

o 

«  E  71. ..j'.Xj ,  +  (22) 

J 

where  we  have  used  (4)  and  x;  is  the  gray  level  function 
of  the yth  image  pixel.  The  symbol  “  •  ”  in  the  subscripts 
means  that  the  7)  .  and  /,  .  are  independent  of  k.  Equa¬ 

tion  (22)  suggests  that  we  can  use  a  multivalue  number  to 
replace  the  simple  sum  number.  Since  the  interconnection 
strengths  are  determined  by  the  blur  function,  the  differ¬ 
ential  operator,  and  the  constant  A  as  shown  in  (16),  it  is 
easy  to  see  that  if  the  blur  function  is  local,  then  most 
interconnection  strengths  are  zeros  and  the  neurons  are 
locally  connected.  Therefore,  most  elements  of  the  inter¬ 
connection  matrix  T  are  zeros.  If  the  blur  function  is  shift 
invariant  taking  the  form  in  (8),  then  the  interconnection 
matrix  is  block  Toeplitz  so  that  only  a  few  elements  need 
to  be  stored.  Based  on  the  value  of  inputs  uik,  the  state  of 
the  (t,  k) th  neuron  is  updated  by  applying  a  decision  rule. 
The  state  change  of  the  (t,  k) th  neuron  in  turn  causes  the 
gray  level  function  x(  to  change: 


^  old 
'■*1 

if  Avik  =  0 

x°*d  +  1 

if  At/,  *  =  1 

(23) 

xold  -  1 

if  At/,.*  =  -  1 

where  At/,*  =  t/'K*w  -  t/°*  is  the  state  change  of  the  (i, 
£)th  neuron.  The  superscripts  “new”  and  “old”  are  for 
after  and  before  updating,  respectively.  We  use  x,  to  rep¬ 
resent  the  gray  level  value  as  well  as  the  output  of  M  neu¬ 
rons  representing  x,.  Assuming  that  the  neurons  of  the 
network  are  sequentially  visited,  it  is  straightforward  to 
show  that  the  updating  procedure  can  be  reformulated  as 

o 

=  Z  +  /,.  (24) 

J 


bVi.k  =  g(Ui.k) 


^  A  vlk 

=  0 

if  u,.k 

At/,.* 

=  1 

if  «/.* 

^At/,.* 

=  -i 

if 

'x°,d  + 

At/,.* 

if  A£ 

yd 

if  A£ 

=  0 
>  0 
<  0 
<  0 
>  0. 


(25) 


(26) 


Note  that  the  stochastic  decision  rule  can  also  be  used  in 
(26).  In  order  to  limit  the  gray  level  function  to  the  range 
0-255  after  each  updating  step,  we  have  to  check  the  value 
of  the  gray  level  function  x|iew.  Equations  (24),  (25),  and 
(26)  give  a  much  simpler  algorithm.  This  algorithm  is 
summarized  below. 

Algorithm  2: 

1)  Take  the  degraded  image  as  the  initial  value. 

2)  Sequentially  visit  all  numbers  (image  pixels).  For 
each  number,  use  (24),  (25),  and  (26)  to  update  it  repeat¬ 
edly  until  there  is  no  further  change,  i.e.,  if  At/,*  =  0  or 
energy  change  A£  >  0;  then  move  to  the  next  one. 

3)  Check  the  energy  function;  if  energy  does  not 
change  anymore,  a  restored  image  is  obtained;  otherwise, 
go  back  to  step  J)-for  another  iteration. 

The  calculations  of  the  inputs  u,.*  of  the  (/,  k) th  neuron 
and  the  energy  change  A  £  can  be  simplified  furthermore. 
When  we  update  the  same  image  gray  level  function  re¬ 
peatedly,  the  input  received  by  the  current  neuron  (j,  k) 
can  be  computed  by  making  use  of  the  previous  result 


=  “i.t-i  +  At/,  ,*7)...,,.  (27) 


where  ui  k _ ,  is  the  inputs  received  by  the  (i,  k  -  l)th 
neuron.  The  energy  change  A £  due  to  the  state  change  of 
the  ( i,  k  )th  neuron  can  be  calculated  as 


A£  =  —ui  kAVj'k  —  7)  •;,-..( At/,  *)\  (28) 

If  the  blur  function  is  shift  invariant,  all  these  simpli¬ 
fications  reduce  the  space  and  time  complexities  signifi¬ 
cantly  from  0{L*M2)  and  0(L4M2K)  to  0(L2)  and 
0(ML?K),  respectively.  Since  every  gray  level  function 
needs  only  a  few  updating  steps  after  the  first  iteration, 
the  computation  at  each  iteration  is  0(L2).  The  resulting 
algorithm  can  be  easily  simulated  on  minicomputers  for 
images  as  large  as  512  x  512. 


VI.  Computer  Simulations 

The  practical  algorithm  described  in  the  previous  sec¬ 
tion  was  applied  to  synthetic  and  real  images  on  a  Sun-3/ 
160  Workstation.  In  all  cases,  only  the  deterministic  de¬ 
cision  rule  was  used.  The  results  are  summarized  in  Figs. 
1  and  2. 

Fig.  1  shows  the  results  fora  synthetic  image.  The  orig¬ 
inal  image  shown  in  Fig.  1(a)  is  of  size  32  x  32  with 
three  gray  levels.  The  image  was  degraded  by  convolving 
with  a  3  x  3  blur  function  as  in  (8)  using  circulant  bound¬ 
ary  conditions;  22  dB  white  Gaussian  noise  was  added 
after  convolution.  A  perfect  image  was  obtained  after  six 
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(a)  (b)  (c) 

Fig.  I.  Restoration  of  noisy  blurred  synthetic  image,  (a)  Original  image, 
(b)  Degraded  image,  (c)  Result  after  six  iterations. 


(a)  (b) 


(c)  (d) 

Fig.  2.  Restoration  of  noisy  blurred  real  image,  (a)  Original  girl  image, 
(b)  Image  degraded  by  5  x  5  uniform  blur  and  quantization  noise,  (c) 
The  restored  image  using  inverse  filter,  (d)  The  restored  image  using  our 
approach. 

iterations  without  preprocessing.  We  set  the  initial  state 
of  all  neurons  to  equal  1,  i.e.,  firing,  and  chose  X  =  0 
due  to  the  well  conditioning  of  the  blur  function. 

Fig.  2(a)  shows  the  original  girl  image.  The  original 
image  is  of  size  256  x  256  with  256  gray  levels.  The 
variance  of  the  original  image  is  2797.141.  It  was  de¬ 
graded  by  a  5  x  5  uniform  blur  function.  A  small  amount 
of  quantization  noise  was  introduced  by  quantizing  the 
convolution  results  to  8  bits.  The  noisy  blurred  image  is 
shown  in  Fig.  2(b).  For  comparison  purpose,  Fig.  2(c) 
shows  the  output  of  an  inverse  filter  [15],  completely 
overridden  by  the  amplified  noise  and  the  ringing  effects 
due  to  the  ill-conditioned  blur  matrix  H.  Since  the  blur 
matrix  H  corresponding  to  the  5  x  5  uniform  blur  func¬ 
tion  is  not  singular,  the  pseudoinverse  filter  [15]  and  the 
inverse  filter  have  the  same  output.  The  restored  image  by 
using  our  approach  is  shown  in  Fig.  2(d).  In  order  to  avoid 
the  ringing  effects  due  to  the  boundary  conditions,  we  took 
4  pixel  wide  boundaries,  i.e.,  the  first  and  last  four  rows 
and  columns,  from  the  original  image  and  updated  the  in¬ 
terior  region  (248  x  248)  of  the  image  only.  The  noisy 


blurred  image  was  used  as  an  initial  condition  for  accel¬ 
erating  the  convergence.  The  constant  X  was  set  to  zero 
because  of  small  noise  and  good  boundary  values.  The 
restored  image  in  Fig.  2(d)  was  obtained  after  213  itera¬ 
tions.  The  square  error  (i.e.,  energy  function)  defined  in 
(12)  is  0.02543  and  the  square  error  between  the  original 
and  the  restored  image  is  66.5027. 

VII.  Choosing  Boundary  Values 

As  mentioned  in  [16],  choosing  boundary  values  is  a 
common  problem  for  techniques  ranging  from  determin¬ 
istic  inverse  filter  algorithms  to  stochastic  Kalman  filters. 
In  these  algorithms,  boundary  values  determine  the  entire 
solution  when  the  blur  is  uniform  [17],  The  same  problem 
occurs  in  the  neural  network  approach.  Since  the  5  x  5 
uniform  blur  function  is  ill  conditioned,  improper  bound¬ 
ary  values  may  cause  ringing  which  may  affect  the  re¬ 
stored  image  completely.  For  example,  appending  zeros 
to  the  image  as  boundary  values  introduces  a  sharp  edge 
at  the  image  border  and  triggers  ringing  in  the  restored 
image  even  if  the  image  has  zero  mean.  Another  proce¬ 
dure  is  to  assume  a  periodic  boundary.  When  the  left  (top) 
and  right  (bottom)  borders  of  the  image  are  different,  a 
sharp  edge  is  formed  and  ringing  results  even  though  the 
degraded  image  has  been  formed  by  blurring  with  peri¬ 
odic  boundary  conditions.  The  drawbacks  of  these  two 
assumptions  for  boundary  values  were  reported  in  [16], 
[2],  [18]  for  the  2-D  Kalman  filtering  technique.  We  also 
tested  our  algorithm  using  these  two  assumptions  for 
boundary  values;  the  results  indicate  the  restored  images 
were  seriously  affected  by  ringing. 

In  the  last  section,  to  avoid  the  ringing  efiFect,  we  took 
4  pixel  wide  borders  from  the  original  image  as  boundary 
values  for  restoration.  Since  the  original  image  is  not 
available  rin  practice  always,  an  alternative  to  eliminate 
the  ringing  effect  caused  by  sharp  false  edges  is  to  use  the 
blurred  noisy  boundaries  from  the  degraded  image.  Fig. 
3(a)  shows  the  restored  image  using  the  first  and  last  four 
rows  and  columns  of  the  blurred  noisy  image  in  Fig.  2(b) 
as  boundary  values.  In  the  restored  image,  there  still  ex¬ 
ists  some  ringing  due  to  the  naturally  occurring  sharp 
edges  in  the  region  near  the  borders  in  the  original  image, 
but  not  due  to  boundary  values.  A  typical  cut  of  the  re¬ 
stored  image  to  illustrate  ringing  near  the  borders  is  shown 
in  Fig.  4.  To  remove  the  ringing  near  the  borders  caused 
by  naturally  occurring  sharp  edges  in  the  original  image, 
we  suggest  the  following  techniques. 

First,  divide  the  image  into  three  regions:  border,  sub¬ 
border,  and  interior  region  as  shown  in  Fig.  5.  For  the  5 
x  5  uniform  blur  case,  the  border  region  will  be  4  pixels 
wide  due  to  the  boundary  effect  of  the  bias  input  /,  t  in 
(17),  and  the  subb  rder  region  will  be  4  or  8  pixels  wide. 
In  fact,  the  width  of  the  subbordcr  region  will  be  image 
dependent.  If  the  regions  near  the  border  are  smooth,  then 
the  width  of  the  subborder  region  will  be  small  or  even 
zero.  If  the  border  contains  many  sharp  edges,  the  width 
will  be  large.  For  the  real  girl  image,  we  chose  the  width 
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Fig  3.  Results  using  blurred  noisy  boundaries,  (a)  Blurred  noisy  bound¬ 
aries.  (b)  Method  1.  (c)  Method  2. 


of  the  subborder  region  to  be  8  pixels.  We  suggest  using 
one  of  the  following  two  methods. 

Method  I :  In  the  case  of  small  noise,  such  as  quanti¬ 
zation  error  noise,  the  blurred  image  is  usually  smooth. 
Therefore,  we  restricted  the  difference  between  the  re¬ 
stored  and  blurred  image  in  the  subborder  region  to  a  cer¬ 
tain  range  to  reduce  the  ringing  effect.  Mathematically, 
this  constraint  can  be  written  as 

||  x,  —  v,  ||  <  T  for  i  e  subborder  region  (29) 

where  T  is  a  threshold  and  x,  is  the  restored  image  gray 
value.  Fig.  3(b)  shows  the  result  of  using  this  method  with 

r=  jo. 

Method  2:  This  method  simply  sets  X  in  (12)  to  zero  in 
the  interior  region  and  nonzero  in  the  subborder  region, 
respectively.  Fig.  3(c)  shows  the  result  of  using  this 
method  with  X  =  0.09.  In  this  case,  D  was  a  Laplacian 
operator. 

Owing  to  checking  all  restored  image  gray  values  in  the 
subborder  region,  Method  1  needs  more  computation  than 
Method  2.  However,  Method  2  is  very  sensitive  to  the 
parameter  X,  while  Method  1  is  not  so  sensitive  to  the 
parameter  X.  E^erimental  results  show  that  both  Meth¬ 
ods  1  and  2  reduce  the  ringing  effect  significantly  by  using 
the  suboptimal  blurred  boundary  values. 


250. 


0  0  1 - - - - - • - — - -  — - - ■ - ' 

190.  200.  210.  220.  230.  240.  250.  260. 

Fig.  4.  One  typical  cut  of  the  restored  image  using  the  blurred  noisy 
boundaries.  Solid  line  for  original  image,  dashed  line  for  blurred  noisy 
image,  and  dashed  and  dotted  line  for  restored  image. 


border  region 
subborder  region 
interior  region 


Fig  5  Border,  suhbordcr.  ,md  inferior  regions  of  the  image 


VIII.  Comparisons  to  Other  Restoration  Methods 

Comparing  the  performance  of  different  restoration 
methods  needs  some  quality  measures  which  are  difficult 
to  define  owing  to  the  lack  of  knowledge  about  the  human 
visual  system.  The  word  “optimal”  used  in  the  restora¬ 
tion  techniques  usually  refers  only  to  a  mathematical  con¬ 
cept,  and  is  not  related  to  response  of  the  human  visual 
system.  For  instance,  when'the  blur  function  is  ill  con¬ 
ditioned  and  the  SNR  is  low;  the  MMSE  method  im¬ 
proves  the  SNR,  but  the  resulting  image  is  not  visually 
good.  We  believe  that  human  objective  evaluation  is  the 
best  ultimate  judgment.  Meanwhile,  the  mean-square  er¬ 
ror  or  least  square  error  can  be  used  as  a  reference. 

For  comparison  purposes,  we  give  the  outputs  of  the 
inverse  filter,  SVD  pseudoinverse  filter.  MMSE  filter,  and 
modified  MMSE  filter  using  the  Gaussian  Markov  random 
field  (GMRF)  model  (19],  [5], 

A.  Inverse  Filter  and  SVD  Pseudoinverse  Filter 

An  inverse  filter  can  be  used  to  restore  an  image  de¬ 
graded  by  a  space-invariant  blur  function  with  high  sig- 
nal-to-noise  ratio.  When  the  blur  function  has  some  sin¬ 
gular  points,  an  SVD  pseudoinverse  filter  is  needed; 
however,  both  filters  are  very  sensitive  to  noise.  This  is 
because  the  noise  is  amplified  in  the  same  way  as  the  sig¬ 
nal  components  to  be  restored.  The  inverse  filter  and  SVD 
pseudoinverse  filter  were  applied  to  an  image  degraded  by 
the  5  x  5  uniform  blur  function  and  quantization  noise 
( about  40  dB  SNR ).  The  blurred  and  restored  images  are 
shown  in  Fig.  2(b)  and  (c).  respectively.  As  we  men¬ 
tioned  before,  the  outputs  of  these  tillers  are  completely 
overridden  by  the  amplified  noise  and  ringing  effects. 
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(a)  (b) 


(C)  (d) 

Fig.  6.  Comparison  to  other  restoration  methods,  (a)  Image  degraded  by 
5x5  uniform  blur  and  20  dB  SNR  additive  while  Gaussian  noise,  (b) 
The  restored  image  using  the  MMSE  filter,  (c)  The  restored  image  using 
the  modified  MMSE  filter,  (d)  The  restored  image  using  our  approach. 

B.  MMSE  and  Modified  MMSE  Filters 

The  MMSE  filter  is  also  known  as  the  Wiener  filter  (in 
the  frequency  domain).  Under  the  assumption  that  the 
original  image  obeys  a  GMRF  model,  the  MMSE  filter 
(or  Wiener  filter)  can  be  represented  in  terms  of  the  GMRF 
model  parameters  and  the  blur  function.  In  our  imple¬ 
mentation  of  the  MMSE  filter,  we  used  a  known  blur 
function,  unknown  noise  variance,  and  the  GMRF  model 
parameters  estimated  from  the  blurred  noisy  image  by  a 
maximum  likelihood  (ML)  method  [19].  The  image  shown 
in  Fig.  6(a)  was  degraded  by  5  x  5  uniform  blur  function 
and  20  dB  SNR  additive  white  Gaussian  noise.  The  re¬ 
stored  image  is  shown  in  Fig.  6(b). 

The  modified  MMSE  filter  in  terms  of  the  GMRF  model 
parameters  is  a  linear  weighted  combination  of  a  Wiener 
filter  with  a  smoothing  operator  (such  as  a  median  filter) 
and  a  pseudoinverse  filter  to  smooth  the  noise  and  pre¬ 
serve  the  edge  of  the  restored  image  simultaneously.  De¬ 
tails  of  this  filter  can  be  found  in  [5] .  We  applied  the  mod¬ 
ified  MMSE  filter  to  the  same  image  used  in  the  MMSE 
filter  above  with  the  same  model  parameters.  The  smooth¬ 
ing  operator  is  a  9  x  9  cross  shape  median  filter.  The 
resulting  image  is  shown  in  Fig.  6(c). 

The  result  of  our  method  is  also  shown  in  Fig.  6(d). 
The  D  we  used  in  (12)  was  a  Laplacian  operator  as  in 
(13).  We  chose  X  =  0.0625  and  used  4  pixel  wide  blurred 
noisy  boundaries  for  restoration.  The  total  number  of  it¬ 
erations  was  20.  The  improvement  of  mean-square  error 
between  the  restored  image  and  the  original  image  for  each 
method  is  shown  in  Table  l.  In  the  table,  the  "MMSE 
(o)”  denotes  that  the  parameters  were  estimated  from  the 


TABLE  I 

Mean-Square  Error  Improvement 


Modified 

Neural 

Method 

MMSE 

MMSE  (o) 

MMSE 

Network 

Mean-square  error 

1.384  dB 

2.139  dB 

1.893  dB 

1.682  dB 

original  image.  The  restored  image  using  “MMSE  (o)” 
is  very  similar  to  Fig.  6(a).  As  we  mentioned  before,  the 
comparison  of  the  outputs  of  the  different  restoration 
methods  is  a  difficult  problem.  The  MMSE  filter  visually 
gives  the  worst  output  which  has  the  smallest  mean-square 
error  for  the  MMSE  (o)  case.  The  result  of  our  method 
is  smoother  than  that  of  the  MMSE  filter.  Although  the 
output  of  the  modified  MMSE  filter  is  smooth  in  flat  re¬ 
gions,  it  contains  some  artifacts  and  snake  effects  at  the 
edges  due  to  using  a  large  sized  median  filter. 

IX.  Parameter  Learning  for  Linear  Image  Blur 
Model 

Apart  from  fine-grain  parallelism,  fast  (and  preferably 
automatic)  adaptation  of  a  problem-solving  network  to  dif¬ 
ferent  instances  of  a  problem  is  a  primary  motivation  for 
using  a  network  solution.  For  pattern  recognition  and  as¬ 
sociative  memory  applications,  this  weight  training  is 
done  by  distributed  algorithms  that  optimize  a  distance 
measure  between  sample  patterns  and  network  responses. 
However,  in  feedback  networks,  general  problems  that 
involve  learning  higher  order  correlations  (like  the  exclu¬ 
sive  or)  or  combinatorial  training  sets  (like  the  Traveling 
Salesperson  problem)  are  difficult  to  solve  and  may  have 
exponential  complexity.  Jn  particular,  techniques  for  find¬ 
ing  a  compact  training  set  <!o  not  exist. 

A.  Learning  Model 

For  model-based  approaches  to  “neural”  problem 
solving,  the  weights  of  the  main  network  are  computed 
from  the  parameters  of  the  model.  The  learning  problem 
can  then  be  solved  by  a  parallel,  distributed  algorithm  for 
estimating  the  model  parameters  from  samples  of  the  in¬ 
puts  and  desired  outputs.  This  algorithm  can  be  imple¬ 
mented  on  a  secondary  network.  An  error  function  for  this 
“learning”  network  must  be  constructed,  which  will  now 
be  problem-dependent. 

For  the  linear  shift-invariant  blur  model  (5),  the  prob¬ 
lem  is  that  of  estimating  the  parameters  corresponding  to 
the  blur  function  in  a  K  x  K  small  window  centered  at 
each  pixel.  Rewrite  (5)  as 

y(i J)  =  z{i,j)'h  +  n{i,j)  ij  =  1,  2,  •  •  •  ,  L 

(30) 

where  t  denotes  the  transpose  operator  and  z(i,  j )  and  h 
are  K 2  x  1  vectors  corresponding  to  original  image  sam¬ 
ples  in  a  K  x  K  window  centered  at  (/.  j  )  and  blur  func¬ 
tion,  respectively. 
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For  instance,  for  K  =  3,  we  have 

“I  p(-i.-i) 
hi  M-i.o) 

h  =  /i3  =  M-I,  1) 

A,  A(  I,  1) 


of  the  other  neurons  because 

£/>!=/-  ~  £*><=/,,  =  _  ft  ~  (/m  +  fa)wk.k] 

[L-fn)  (37) 

where 

K* 

r*  =  2 

/.i  *  k 

is  the  current  weighted  sum  from  the  other  neuron  activ¬ 
ities.  Thus,  we  choose  level  m  over  n  for  m  >  n  if 


z(ij)2 


x(i  ~  1,7  -  1) 
x(i  - 

x(i  -  l,y  +  1) 
jc(i  +  l,y  +  1) 


•We  can  use  an  error  function  for  estimation  of  A,  as  in 
the  restoration  process,  because  the  roles  of  data  {jc(/, 
7  ) }  and  parameter  h  are  simply  interchanged  in  the  learn¬ 
ing  process.  Therefore,  an  error  function  is  defined  as 

£  =  S  [y(i, 7)  “  h'z(ij)f  (33) 

(i.j)eS 

where  S  is  a  subset  of  { (i,j ),  i,  j  -  1,2,  •  •  •  ,  L }  and 
y(i,  7)  and  z(i,j)  are  training  samples  taken  from  the 
degraded  and  original  images,  respectively.  The  network 
energy  functions  is  given  by 

£  =  -  E  E  wk,hkh,  -  E  dkhk  (34) 

*  - 1  1- 1  *  =  1 

where  hk  are  the  multilevel  parameter  activities  and  wt/ 
and  6k  are  the  symmetric  weights  and  bias  inputs,  respec¬ 
tively.  From  (33)  and  (34),  we  get  the  weights  and  bias 
inputs  in  the  familiar  outer-product  forms: 

wu  -  ~  2  z(i,  j)k  z{i.j)t  (35) 

(i.j)eS  *  1 


dk  =  2  E  z(i,  j)k  y(i,  7 ) . 

U.j)eS 


A  greedy,  distributed  neural  algorithm  is  used  for  the 
energy  minimization.  This  leads  to  a  localized  multilevel 
number  representation  scheme  for  a  general  network. 


ft  >  0t  -  (fm  +fn)wkk.  (38) 

Some  properties  of  this  algorithm  follow. 

1)  Convergence  is  assured  as  long  as  the  number  of 
levels  is  not  decreasing  with  time  (i.e.,  assured  if  coarse 
to  fine). 

2)  Self-feedback  terms  are  included  as  level-dependent 
bias  input  terms. 

3)  The  method  can  be  easily  extended  to  higher  order 
networks  (e.g.,  based  on  cubic  energies).  Appropriate 
lower  order  level-dependent  networks  (like  the  extra  bias 
input  term  abovd()  must  then  be  implemented. 

The  multilevel  lowest  energy  decision  can  be  imple¬ 
mented  by  using  variations  of  feedforward  min-finding 
networks  (such  as  those  summarized  in  [20]).  The  space 
and  time  complexity  of  these  networks  are,  in  general, 
CHI’)  and  0(log  F),  respectively.  However,  in  the  quad¬ 
ratic  case,  it  is  easy  to  verify  from  (38)  that  we  need  only 
implement  the  decision  between  all  neighboring  levels  in 
the  set  {/}\  this  requires  exactly  T  neurons  with  level- 
dependent  inputs.  The  best  activity  in  the  set  is  then  pro¬ 
portional  to  the  sum  of  the  T  neuron  outputs  so  that  the 
time  complexity  for  the  multilevel  decision  can  be  made 
0(1).  This  means  that  this  algorithm  is  similar  in  imple¬ 
mentation  complexity  (e.g.,  the  number  of  problem-de- 
pendent  global  interconnects  required)  to  the  simple  sum 
energy  representation  used  in  [10]  and  in  this  paper.  Also, 
in  the  simple  sum  case,  visiting  the  neurons  for  each  pixel 
in  sequence  will  result  in  conditional  energy  minimiza¬ 
tion.  Otherwise,  from  the  implementation  point  of  view, 
the  two  methods  have  some  properties  that  are  comple¬ 
mentary.  For  example,  we  have  the  following. 

1)  The  simple  sum  method  requires  asynchronism  in 
the  update  steps  for  each  pixel,  while  the  greedy  method 
does  not. 

2)  The  level-dependent  terms  arise  as  inputs  in  the 
greedy  method  as  compared  to  weights  in  the  simple  sum 
method. 


B.  Multilevel  Greedy  Distributed  Algorithm 

For  a  K 1  neuron  second-order  network,  we  choose  T 
discrete  activities  {  /,,  i  =  0,  1 ,  •  •  •  ,  T  —  1  }  in  any 
arbitrary  range  of  activities  (e.g.,  [0,  1  ] )  where  we  shall 
assume  without  loss  of  generality  that  /  >  /  _,  for  all  /. 
Then,  between  any  two  activities  fm  and  f„  for  the  £th  neu¬ 
ron.  we  can  locally  and  asynchronously  choose  the  one 
which  results  in  the  lowest  energy  given  the  current  state 


C.  Simulation  Results 

The  greedy  algorithm  was  used  with  the  weights  from 
(35)  and  (36)  to  estimate  the  parameters  from  original  and 
blurred  sample  points.  A  5  X  5  window  was  used  with 
two  types  of  blurs:  uniform  and  Gaussian.  Both  real  and 
synthetic  images  were  used,  with  and  without  additive 
Gaussian  noise. 
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TABLE  II 

Results  for  Parameter  Learning.  The  Number  T  of  Discrete  Activities  is  256  for  all  Tests.  A: 
Arbitrary  Choice  of  Pixels  from  Image.  L:  Pixels  Chosen  from  Thresholded  Laplacian 


Image 

Noise 

Blur 

Samples 

Methods 

Iterations 

MSE 

Synthetic 

Gaussian 

68 

A 

49 

0.0Gu023 

Synthetic 

Uniform 

100 

A 

114 

0,000011 

Real 

Uniform 

50 

A 

94 

0.00353 

Real 

Uniform 

100 

L 

85 

0.00014 

Real 

20  dB 

Uniform 

100 

A 

72 

0.00232 

Real 

20  dB 

Uniform 

100 

L 

83 

0.00054 

The  estimated  parameters  for  all  types  of  blur  matrices 
were  numerically  very  close  to  the  actual  values  when 
synthetic  patterns  were  used.  The  network  took  longest  to 
converge  with  a  uniform  blur  function.  The  levels  chosen 
for  the  discrete  activity  set  { / }  were  128-256  equally 
spaced  points  in  [0,  1]  with  50-100  sample  points  from 
the  image.  Results  for  various  cases  are  summarized  in 
Table  II. 

When  the  sample  pixels  were  randomly  chosen,  the  er¬ 
rors  increased  by  two  orders  of  magnitude  for  a  real  image 
[Fig.  2(b)]  as  compared  to  synthetic  ones.  This  is  due  to 
the  smooth  nature  of  real  images.  To  solve  this  problem, 
sample  points  were  chosen  so  as  to  lie  close  to  edges  in 
the  image.  This  was  done  by  thresholding  the  Laplacian 
of  the  image.  Using  sample  points  above  a  certain  thresh¬ 
old  for  estimation  improved  the  errors  by  an  order  of  mag¬ 
nitude.  The  results  were  not  appreciably  degraded  with 
20  dB  noise  in  the  samples. 

X.  Conclusion 

This  paper  has  introduced  a  new  approach  for  the  res¬ 
toration  of  gray  level  images  degraded  by  a  shift-invariant 
blur  function  and  additive  noise.  The  restoration  proce¬ 
dure  consists  of  two  steps:  parameter  estimation  and  im¬ 
age  reconstruction.  In  order  to  reduce  computational  com¬ 
plexity,  a  practical  algorithm  (Algorithm  2),  which  has 
equivalent  results  to  the  original  one  (Algorithm  1),  is  de¬ 
veloped  under  the  assumption  that  the  neurons  are  se¬ 
quentially  visited.  The  image  is  generated  iteratively  by 
updating  the  neurons  representing  the  image  gray  levels 
via  a  simple  sum  scheme.  As  no  matrices  are  inverted, 
the  serious  problem  of  ringing  due  to  the  ill-conditioned 
blur  matrix  H  and  noise  overriding  caused  by  inverse  filter 
or  pseudoinverse  inverse  filter  are  avoided  by  using  sub- 
optimal  boundary  conditions.  For  the  case  of  a  2-D  uni¬ 
form  blur  plus  small  noise,  the  neural  network-based  ap¬ 
proach  gives  high-quality  images  compared  to  some  of  the 
existing  methods.  We  see  from  the  experimental  results 
that  the  error  defined  by  (12)  is  small,  while  the  error  be¬ 
tween  the  original  image  and  the  restored  image  is  rela¬ 
tively  large.  This  is  because  the  neural  network  decreases 
energy  according  to  (12)  only.  Another  reason  is  that  when 
the  blur  matrix  is  singular  or  ill  conditioned,  the  mapping 
from  X  to  Y  is  not  one  to  one;  therefore,  the  error  measure 
(12)  is  not  reliable  anymore.  In  our  experiments,  when 
the  window  size  of  a  uniform  blur  function  is  3  x  3,  the 


ringing  effect  was  eliminated  by  using  blurred  noisy 
boundary  values  without  any  smoothing  constraint.  When 
the  window  size  is  5  X  5,  the  ringing  effect  was  reduced 
with  the  help  of  the  smoothing  constraint  and  suboptimal 
boundary  conditions.  We  have  also  shown  that  a  smaller 
secondary  network  can  effectively  be  used  for  estimating 
the  blur  parameters;  this  provides  a  more  efficient  learn¬ 
ing  technique  than  Boltzman  machine  learning  on  the  pri¬ 
mary  network. 
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ABSTRACT 

The  incoherent  optical  neuron  (ION)  subtracts  inhibitory  inputs  from  excitatory  inputs  optically  by 
utilizing  separate  device  responses.  Those  factors  that  affect  the  operation  of  the  ION  are  discussed  here,  such 
as  nonlinearity  of  the  inhibitory  element,  input  noise,  device  noise,  system  noise  and  crosstalk.  A  computer 
simulation  of  these  effects  is  performed  on  a  version  of  Grossberg’s  on-center  off-surround  competitive  neural 
network. 


1  Introduction 


The  need  to  process  positive  and  negative  signals  optically  in  optical  neural  network  has  been  pursued  in  the 
past  few  years.  Existing  techniques  such  as  intensity  bias  [1]  or  weight  bias  method  suffer  from  input  dependent 
bias  or  thresholds  that  must  vary  from  neuron  to  neuron.  A  technique  described  by  Te  Kolste  and  Guest  [2] 
eliminites  most  of  these  drawbacks  in  the  special  case  of  fully  connected  networks. 

The  incoherent  optical  neuron  (ION)  [3,  4)  model  uses  separate  device  responses  for  inhibitory  and  excitatory 
inputs.  This  is  modeled  after  the  biological  neuron  that  processes  the  excitatory  and  inhibitory  signals  by 
different  mechanisms  (e.g.  chemical-selected  receptors  and  ion-selected  gate  channels)  [5,  6,  7,  8],  By  using  this 
architecture,  we  can  realize  general  optical  neuron  units  with  thresholding. 


The  ION  comprises  two  elements:  an  inhibitory  (I)  element  and  a  nonlinear  output  (N)  element.  The 
inhibitory  element  provides  inversion  of  the  sum  of  the  inhibitory  signals;  the  nonlinear  element  operates  on  the 
excitatory  signals,  the  inhibitory  element  output,  and  an  optical  bias  to  produce  the  output.  The  inhibitory 
element  is  linear;  the  nonlinear  threshold  of  the  neuron  is  provided  entirely  by  the  output  nonlinear  device. 
Fig.  1(a)  and  1(c)  shows  the  characteristic  curve  of  the  I  and  N  elements  respectively.  The  structure  of  the 
ION  model  is  illustrated  in  Fig.  1(d).  The  input/output  relationships  for  the  normalized  I  and  N  elements 
respectively,  are  given  by: 


(a)  Inhibitory 
element 


(I)  (b)  unnormalized 

I  element 


(c)  nonlinear  (N)  element 


(d)  The  ION  structure 


Fig.  1  The  ION:  (a)-(c)  its  components,  and  (d)  Its  structure. 
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(1) 


?out  —  I*nh  —  1  link 


l[ut  —  'l’(iinh  +  htc  +  hiat  —  Or)  (2) 

where  /<„*,  and  Iexe  represent  the  total  inhibitory  and  excitatory  inputs,  is  the  bias  term  for  the  N 
element,  which  can  be  varied  to  change  the  threshold,  and  a  is  the  offset  of  the  characteristic  curve  of  the  N 
element.  ip(  )  represents  the  output  nonlinear  function  of  the  N  element.  If  we  choose  /»,„*  to  be  a  —  1,  the 
output  of  the  N  element  is 

jJC?  =  -  /ink)  (3) 

which  is  the  desired  subtraction.  In  general,  the  I  element  will  not  be  normalized  (Fig  1(b)),  in  which 
case  the  offset  and  slope  of  its  response  can  be  adjusted  using  I na,  and  an  attenuating  element  (ND  filter), 
respectively.  The  unnormalized  I  element  must  have  gain  greater  than  or  equal  to  1.  A  positive  threshold  ( 6 ) 
can  be  implemented  by  lowering  the  bias  term  by  the  same  amount  6.  Similarly,  a  negative  threshold  is  realized 
by  increasing  the  bias  term  by  9. 

The  ION  model  can  be  implemented  using  separate  devices  for  the  I  and  N  elements  (heterogeneous  case), 
or  by  using  a  single  device  with  a  nonmonotonic  response  to  implement  both  elements  (homogeneous  case). 
Possible  devices  include  bistable  optical  arrays  {9,  10,  11,  12]  and  SLMs  such  as  liquid  crystal  light  valves 
(LCLV)  [13].  A  single  Huges  liquid  crystal  light  valve  can  be  used  to  implement  both  elements  (Fig.  2). 

Several  factors  that  affect  the  realization  of  a  neural  network  based  on  the  ION  concept,  are  examined  here. 
These  include  deviation  from  linearity  within  the  inhibitory  element,  residual  noise  of  the  optical  device,  input 
noise,  drift  of  the  operation  point  of  the  device,  and  system  noise.  A  noise  model  for  the  ION  is  proposed  and 
a  computer  simulation  of  these  effects  on  a  version  of  Grossberg’s  on-center  off-surround  type  network  [14]  is 
performed. 


2  Factors  that  Affect  the  ION  Operation 

2.1  Nonlinearity  in  I  Element 

In  order  to  perform  subtraction  correctly,  we  need  a  linear  I  element.  Fig.  2  shows  the  typical  input  output 
characteristic  curve  of  the  LCLV  [15],  which  is  nonlinear  in  the  inversion  region.  In  this  region,  the  characteristic 
curve  can  be  modeled  as 


/ou(  —  Z  —  /ink  —  Er(Jinh)  (4) 

where  /?r(/.nk)  denotes  the  error  term,  which  can  be  treated  as  an  input  dependent  deterministic  noise.  If 
the  transfer  curve  is  time  varying,  then  it  can  be  treated  as  temporally  correlated  random  noise. 


Fig.  2  Characteristic  of  a  Hughes  twisted-nematic  liquid  crystal 
light  valve.  The  negative  slope  region  can  implement  the 
I  element,  and  the  positive  slope  region,  with  appropriate 
input  optical  bias,  can  implement  the  N  element. 


2.2  Noise 


Here  we  use  “noise”  to  mean  “any  undesired  signals”,  including  perturbation  of  the  operating  point  of  the  device, 
non-uniformity  of  the  device,  variation  in  operating  characteristics  from  device  to  device  due  to  production 
variation,  environmental  effects  etc.  Some  of  these  effects  are  global  (they  affect  all  neuron  units  on  a  device 
identically),  others  are  localized  (each  neuron  unit  behaves  differently;  the  noise  on  neighboring  neuron  units  on 
a  device  may  be  independent  or  correlated,  depending  on  the  source  of  the  noise).  Both  temporal  and  spatial 
characteristics  of  the  noise  need  to  be  included.  The  effect  of  noise  on  an  additive  lateral  inhibitory  network 
was  discussed  by  Stirk,  Rovnyak  and  Athaie  [16].  Here,  we  construct  a  noise  model  for  the  ION  by  considering 
the  origin  and  impact  of  the  noise  sources. 

The  possible  noise  sources  in  the  ION  model  can  be  classified  into  four  categories:  input  noise,  device  noise, 
system  noise  and  coupling  noise.  The  input  noise  includes  environmental  background  noise,  residual  output 
of  the  optical  devices  etc.  Essentially,  they  are  not  zero  mean  and  vary  slowly  with  time.  The  device  noise 
is  mainly  caused  by  uncertainty  in  the  device’s  characteristics,  for  example  drift  of  the  operating  point  and 
variation  of  gain,  due  to  temperature  or  other  effects.  The  system  noise  has  global  effect  on  all  neuron  units  on 
an  optical  device  and  includes  fluctuations  in  the  optical  source.  Finally,  the  coupling  noise  (crosstalk)  is  due 
to  poor  isolation  between  the  optical  neuron  units,  crosstalk  from  the  interconnection  network,  and  imperfect 
learning.  As  noted  in  [16],  alignment  inaccuracies  and  imperfect  focussing  and  collimating  optics  also  cause 
localized  crosstalk.  Coupling  noise  is  signal  dependent. 

2.3  Noise  Model  for  the  ION 

* 

Device  Input  Noise 

Let  the  environmental  background  noise  for  the  I  and  N  elements  be  denoted  by  and  A'j A  ^  respectively. 
The  total  residual  output  noise,  caused  by  the  optical  devices  to  the  input  of  an  incoherent  optical  neuron,  is 
Nr  —  EU'i  Wijlr/Novt,  which  is  weight-dependent  and  varies  slowly  with  time  due  to  learning.  Wfy  is  the 
interconnection  strength  from  neuron  j  to  neuron  i.  Ir  is  the  residual  output  of  the  optical  device  (Fig.  1(c)). 
Nin  and  Nout  denote  the  fan-in  and  fan-out  of  the  optica!  neuron  unit  respectively.  Perturbation  of  the  weights 
can  be  treated  as  an  input  dependent  noise  source  as  Nw  =  £  AH7,-,-  where  each  AH'iy  is  independent.  For 
the  interconnection  network,  imperfect  learning  of  the  weights,  nonuniformity  of  the  weights,  residual  weights 
after  reprogramming  and  perturbation  of  the  reference  beam  intensity  will  cause  weight  noise.  Then  the  output 
of  the  ION  for  the  case  of  normalized  characteristics  is  4 

lout  =  \2>{[1  -  (link  +  Ar>(/)  +  try*  +  N^)]  +  [Itzc  +  N\*  +  N(*>  +  N<">]  +  (a  -  1)  -  o)  (5) 

If  the  background  noise  is  space  invariant  and  the  I  and  N  element  have  the  same  device  area,  the  terms 
and  Arj^  will  cancel  out.  The  residual  noise  terms  Nr^  and  Nr^\  and  weight  noise  and  generally 
do  not  cancel. 

Device  Noise 


There  are  two  possible  noise  sources  in  the  I  element,  as  illustrated  in  Fig.  3(a)  and  (b):  shift  (drift)  and 
gain  variation  in  the  device  characteristics,  which  are  denoted  as  and  respectively.  For  the  output  N 
element,  the  gain  variation  (Fig.  3(e))  only  modifies  the  nonlinearity  of  the  element  N.  If  this  gain  variation  is 
a  slowly  varying  effect,  it  will  have  little  effect  on  the  dynamic  behavior  of  the  network;  so  for  the  N  element 
we  only  consider  the  drift  effect.  Let’s  denote  it  by  N^N\  Two  different  drifts  in  the  N  element  are  possible, 
horizontal  drift  ( N (Fig.  3(c))  and  vertical  drift  {Nj^)  (Fig  3(d)).  The  vertical  drift  of  one  neuron  unit 
becomes  an  additive  noise  at  the  input  of  the  next  neuron  unit,  and  so  will  be  approximated  by  including  it 
in  the  residual  noise  term  above.  The  horizontal  drift  has  the  same  effect  as  a  perturbation  in  the  bias  term, 
denoted  by  Nt,p. 

If  the  gain  variation  is  small,  the  output  of  the  I  element  can  be  expressed  as  (1  4-  N,)  —  (1  +  Nt)I{ where 
Nt  denotes  the  gain  noise. 


input 


(a)drift  (vertical/horizontaJ)  (b)  gain  variation  effect  of  (c)  horizontal  drift  ol  the  N  (d)  vertical  drift  ol  the  N 

ot  the  I  element.  me  I  element.  element  or  variation  on  element 

the  bias. 


(e)  gain  variation  ettecl  ot 
the  N  element. 


Fig.  3  Modeling  the  device  noise  of  an  Incoherent  optical  neuron. 


System  Noise 

System  noise  has  a  global  effect  on  the  ION.  If  it  is  caused  by  an  uncertainty  in  the  optical  source,  it  causes 
a  variation  in  the  characteristic  curve  of  the  device  and  a  perturbation  in  the  bias  term.  In  the  case  of  an  LCLV, 
a  perturbation  in  the  reading  beam  intensity  produces  a  gain  variation  in  the  I  element  and  a  combination  of 
gain  variation  and  horizontal  drift  in  the  N  element  (or  equivalently,  essentially  a  re-scaling  of  its  input  axis). 
Gain  variation  of  the  I  element  was  discussed  above.  The  difference  is  that  the  device  gain  variation  is  local,  ie. 
it  varies  from  one  neuron  unit  to  another  on  a  given  device,  while  the  variation  in  gain  due  to  system  noise  is 
global. 

The  device  noise  and  system  noise  can  be  modeled  as 

lout  =  {(1  +  AT<'>)[ I  +  N(dI]  -  Iinh)  +  /e*c  +  Ng'  *  (a  -  1  +  Nip)  -  a}  (6) 


Crosstalk 


Crosstalk  can  be  caused  by  the  physical  construction  of  the  interconnection  network  (e.g.,  coupling  between 
different  holograms,  diffraction  in  the  detection  (neuron  unit  inputs)  plane,  inaccurate  alignment  and  focussing). 
It  can  also  be  caused  by  imperfect  learning  or  reprogramming  of  the  synaptic  weights,  where  the  perturbation 
of  different  weights  are  correlated.  In  general,  crosstalk  is  signal  dependent  and  varies  from  one  neuron  unit  to 
another  on  a  given  device.  It  can  be  modeled  as  an  input  noise  to  the  I  and  N  elements.  It  is  excluded  from 
our  current  simulations  because  it  is  signal  dependent. 

Based  on  the  above  discussion,  these  noise  terms  can  be  grouped  into  additive  (Nj")  and  multiplicative  (ArJ) 
noise  of  the  I  element  and  additive  noise  (Nf,)  of  the  output  element  N.  The  general  noise  model  of  the  ION 
can  be  written  as 


lout  =  *{(1  +  A'/*)[l  -  Ii»h  +  Ar/+]  +  /ere  +  Ar£  -  1}  (7) 

where  Nf  is  the  sum  of  the  drift  noise  (N^),  background  noise  (A^),  residual  noise  (Ar^),  weight  noise 
and  crosstalk  noise  (A^);  N]  is  the  gain  noise  of  I  element;  and  ArjJ  is  the  sum  of  the  background 
(A /*^)  and  residual  input  noise,  horizontal  shift  noise  (N^),  weight  noise  and  crosstalk  noise  of  the  N  element, 
and  bias  noise  (Afjp). 


3  Computer  Simulation 

3.1  Compensation  for  the  Nonlinearity  of  the  I  Element 

To  assess  the  effect  of  imperfect  device  responses  for  the  I  element,  we  have  performed  simulations  on  a  variant  of 
Grossberg’s  on-center  off-surround  competitive  network  [14]  for  edge  detection  (Fig.  4).  The  network  contains 
30  inner-product  type  neurons  connected  in  a  ring  structure  with  input  and  lateral  on-center  off-surround 
connections.  Fig.  5  shows  several  modeled  nonlinear  characteristic  curves  for  the  I  element;  these  approximate 
the  normalized  response  of  a  Hughes  liquid-crystal  light  valve.  An  attenuator  (neutral  density  filter)  can  be 
placed  in  front  of  the  I  element  to  reduce  the  overall  gain  (by  effectively  re-scaling  the  input  axis)  to  bring  it 
closer  to  the  ideal  response.  Fig.  6  shows  computer  simulations  of  the  network  responses  based  on  nonlinear 
curve  #2  for  different  input  attenuations  and  input  levels.  As  shown  here,  the  attenuation  has  a  tolerance 
of  approximately  ±20%.  For  extremely  nonlinear  responses  we  expect  an  input  bias  and  a  limited  region  of 


Fig.  4  On-center  off-surround  competitive  neural 
network. 
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Fig.  5  Four  curves  used  for  simulating  the  effect  of 
nonlinearities  in  the  I  element. 
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Fig.  6  Network  responses  for  different  attenuation  factors,  Sjf  at  the 
input  to  the  nonlinear  inhibitory  element, 
a)  input  patterns,  b)  Si  =  1.0  (no  attenuation),  c)  Sj  =  1.5, 
d)  Si  =  2.5,  e)  Si  =  3.0,  f)  Si  =  3.5.  The  ideal  output  is 
essentially  identical  to  (d). 


operation  to  provide  a  sufficiently  linear  response.  In  our  simulations  we  used  an  attenuator  but  no  input  bias 
to  the  I  element;  the  region  of  operation  for  these  curves  was  seen  to  extend  over  most  of  the  input  range  of  the 
device  [3,4]. 

3.2  Noise  Effect 

We  use  the  same  network  to  test  effect  of  noise  (of  course  the  results  are  actually  network  dependent)  to  get 
an  idea  of  the  noise  imunity  and  robustness  to  physical  imperfections  of  the  ION  model.  In  the  computer 
simulation,  each  of  the  three  noise  sources  in  Eq.  (7)  are  assumed  independently  distributed  Gaussian  with  zero 
mean.  We  define  the  maximum  perturbation,  p,  of  the  noise  source  as  twice  the  standard  deviation,  expressed 
as  a  percentage  of  the  input  signal  level.  A  normalized  mean  square  error  (nmse)  is  used  to  measure  the 
acceptability  of  the  result.  Although  it  is  not  a  perfect  measure,  a  nmse  less  than  0.1-0.15  generally  looks 
acceptable  for  the  network  response  for  our  input  test  pattern. 

Fig.  7(a)  shows  the  nmse  vs.  percentage  of  maximum  noise  perturbation  tor  the  input  level  of  0.7  and  for 
noise  that  is  correlated  over  different  time  periods  T.  The  noise  sources  for  each  neuron  are  assumed  independent 
and  identically  distributed  (ltd).  Each  noise  source  is  temporally  correlated  with  its  previous  values,  as  given 
by  N(t  +  1)  =  hi  ■  Ar(f  +  1  —  i).  The  correlation  coefficients  h,-  decrease  linearly  with  i  (to  hr  =  0). 

In  Fig.  7(a),  all  three  noise  sources  in  our  model  are  present  and  have  the  same  variance.  If  the  acceptance 
nmse  criterion  is  0.15,  a  perturbation  of  ±10%  on  each  noise  source  yields  an  acceptable  result  in  all  cases.  For 
T  =  50,  the  nmse  increases  as  the  input  level  and  noise  variance  increase  as  shown  in  Fig.  7(b).  The  network 
responses  are  shown  in  Fig.  8  for  temporally  correlated  noise  with  perturbation  of  ±10%. 


a)  Normalized  mean  square  error  of  the  net  output  vs.  maximum  noise  perturbation  p  for 
correlation  periods  (T)  ranging  from  1  to  50.  The  input  level  is  0.7. 

b)  Output  nmse  plot  for  different  noise  perturbation  and  input  levels  (T=50). 

In  some  cases,  the  noise  is  spatially  correlated.  We  simulated  the  network  with  spatially  correlated  noise. 
The  spatial  correlation  is  assumed  to  have  a  Gaussian  profile.  Fig  9(a)  and  (b)  are  the  responses  for  a  spatial 
correlation  range  of  5  and  13  respectively,  while  Fig  9(c)  and  (d)  show  the  responses  for  spatially  and  temporally 
correlated  noise. 

Drift  of  the  device  characteristic  is  a  global  effect.  Fig  10  simulates  slowly  varying  and  quickly  varying 
drift  on  this  network.  Fig.  11  shows  the  effect  of  local  gain  variation  that  is  spatially  correlated.  A  ±25% 
perturbation  in  drift  is  apparently  acceptable,  and  a  ±15  —  20%  perturbation  in  gain  is  acceptable. 

4  Discussions  and  Conclusions 

We  have  summarized  sources  of  noise  for  the  ION  and  proposed  a  noise  model.  From  the  result  of  the  com¬ 
puter  simulation,  it  seems  that  the  example  network  performs  much  better  for  quickly  varying  (ie.  temporally 


(a)  (b)  (c)  (d)  (a)  (b)  (c)  (d) 


Fig.  8  The  network  response  for  temporally  cor-  Fig.  9  Simulation  result  of  spatially  and  tempo- 
related  noise  sources.  Three  noise  are  sim-  rally  correlated  noise,  sc  is  the  spatial 

ulated  with  the  same  maximum  noise  per-  correlation  range.  Three  noise  sources 

turbation  of  ±10%.  T  is  the  correlation  are  simulated  simulateneously  with  p  = 

period  of  the  noise  and  nmse  is  the  nor-  ±10%  and  correlation  period  T. 

malized  mean  square  error  of  the  output.  a)-b)  onlv  spatially  correlated  noise  with 

Here,  the  given  value  of  nmse  corresponds  sc  _  '3  nm5£  _  0  08i  b)  se  _  13_ 

to  the  maximum  input  level  case.  nmse  =  0  13>  c)_d)  are  the  resu]t  of 

a)  T=l,  nmse  =  0.07,  b)  T=10,  nmse  =  spatially  and  temporally  correlated  noise 

0.12,  c)  T=25,  nmse  =  0.14,  d)  T=50,  (T=25)  with  c)  sc  =  3,*  nmse  =  0.14,  d) 

nmse  =  0.16.  sc  =  13,  nmse  =  0.16. 


Fig,  10  EfTect  of  device  drift  in  the  ION.  p  is  Fig.  11  The  gain  variation  effect  of  the  ION.  sc 
the  maximum  perturbation  of  the  noise  and  p  are  the  correlation  range  and  maxi¬ 
source.  T  is  the  temporal  correlation  pe-  mum  perturbation  percentage  of  the  noise 

riod  of  the  drift.  The  drift  effect  is  uni-  source.  a)-b)  high  frequency  gain  varia- 

form  over  all  neuron  units,  a)  L  b)  sim-  tion  with  a)  T=l,  sc  =  3,  ,  p  =  ±10%, 

ulate  high  frequency  drift  (T=l)  with  a)  nmse  =  0.04,  b)  T=l,  sc=  3,  p  =  ±25%, 

P  =  ±10%,  nmse  =  0.02,  b)  p  =  ±25%,  nmse  =  0.11.  c)-d)  low  frequency  varia- 

nmse  =  0.09,  c)-d)  are  the  low  frequency  tion  with  c)  T=25,  sc  =  9,  p  =  ±10%, 

drift  (T=50)  with  c)  p  =  ±10%,  nmse  =  nmse  =  0.09,  d)  T=25,  sc  =  9,  p  = 

0.07,  d)  p=  ±25%,  nmse  =  0.11.  ±25%,  nmse  =  0.18. 


uncorrelated)  noise  than  for  temporally  correlated  (more  slowly  varying)  noise.  Due  to  the  static  input  pattern 
and  the  competitive  nature  of  the  network,  once  the  noise  term  has  survived  a  number  of  iterations,  then  it  will 
continue  to  get  stronger  and  will  not  die  out.  We  speculate  that  if  the  input  to  the  network  is  time  varying, 
then  the  slowly  varying  noise  source  is  effectively  an  offset  response  of  the  network  and  might  be  adaptively 
overcome  by  the  network,  while  the  quickly  varying  noise  interacts  with  the  input  patterns  and  is  more  difficult 
to  compensate. 

For  noise  that  is  correlated,  we  have  found  that  the  qualitative  effect  of  each  of  the  three  noise  sources 
(additive  inhibitory,  multiplicative  inhibitory,  and  additive  excitatory)  on  the  output  of  the  net  is  essentially 
the  same.  Since  one  of  the  noise  terms,  Nf},  is  the  same  for  a  conventional  neuron  implementation  as  for  the  ION, 
it  appears  that  an  ION  implementation  is  not  significantly  different  from  a  conventional  neuron  implementation 
in  terms  of  immunity  to  noise  and  device  imperfections,  for  a  given  technology.  We  also  see  that  the  output 
is  affected  primarily  by  the  variance  of  the  noise  and  by  the  degree  of  spatial  and  temporal  correlation,  but 
apparently  not  by  the  source  of  the  noise.  We  conjecture  that  this  result  is  not  peculiar  to  the  ION  model,  but 
is  true  of  other  neuron  implementations  as  well. 
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ABSTRACT 

The  design  of  an  optical  computer  must  be  based  on  the  characteristics  of  optics  and  optical  tech¬ 
nology,  and  not  on  those  of  electronic  technology.  The  property  of  optical  superposition  is  considered 
and  the  implications  it  has  in  the  design  of  computing  systems  is  discussed.  It  can  be  exploited  in  the 
implementation  of  optical  gates,  interconnections,  and  shared  memory. 


INTRODUCTION 

Fundamental  differences  in  the  properties  of  elections  ami  photons  provide  for  expected 
differences  in  computational  systems  based  on  these  elements.  Some,  such  as  the  relative  ease  with 
which  optics  can  implement  regular,  massively  parallel  interconnections  are  well  known.  In  this  paper 
we  examine  how  the  property  of  superposition  of  optical  signals  in  a  linear  medium  can  be  exploited  in 
building  an  optical  or  hybrid  optical/electronic  computer.  This  property  enables  many  optical  signals  to 
pass  through  the  same  point  in  space  at  the  same  time  without  causing  mutual  interference  or  crosstalk. 
Since  elections  do  not  have  this  property,  this  helps  to  shed  more  light  on  the  role  that  optics  could 
play  in  computing.  We  will  separately  consider  the  use  of  this  property  in  interconnections,  gates,  and 
memory. 


INTERCONNECTIONS 

A  technique  for  implementing  optical  interconnections  from  one  2-D  array  to  another  (or  within 
the  same  array)  has  been  described  (Jenkins  et  aL  1984].  It  utilizes  two  holograms  in  succession  (Fig. 
1).  The  holograms  can  be  generated  by  a  computer  plotting  device.  The  idea  is  to  define  a  finite 
number,  M ,  of  distinct  interconnection  patterns,  and  then  assemble  the  interconnecting  network  using 
only  these  M  patterns.  The  second  hologram  of  Fig.  1  consists  of  an  array  of  facets,  one  for  each  of 
the  M  interconnection  patterns.  The  first  hologram  contains  one  facet  for  each  input  node,  and  serves 
to  address  the  appropriate  patterns  in  the  second  hologram. 

It  is  the  superposition  property  that  makes  this  interesting.  Note  that  many  different  signal  beams 
can  pass  through  the  same  facet  of  the  second  hologram  at  the  same  time  without  causing  mutual 
interference.  (All  of  these  signals  merely  get  shifted  in  the  same  direction  and  by  the  same  amount) 
This  feature  decreases  the  complexity  of  both  holograms  --  The  first  because  it  only  has  to  address  M 
facets,  the  second  hologram  because  it  only  has  M  facets.  Let  N  be  the  number  of  nodes  in  the  input 
and  output  arrays.  The  complexity  (number  of  resolvable  spots)  of  each  hologram  can  be  shown  to  be 
proportional  to  NM ,  with  the  proportionality  constant  being  approximately  25  (Jenkins  et  aL,  1984], 

Using  this  as  a  model  for  interconnections  in  parallel  computing,  a  comparison  can  be  made 
between  the  complexity  of  these  optical  interconnections  with  those  of  electronic  VLSI  for  various 


interconnection  networks.  Results  of  this  have  been  given  in  [Giles  and  Jenkins,  1986].  It  is  found 
that  in  general  the  optical  interconnections  have  an  equal  or  lower  space  complexity  than  electronic 
interconnections,  with  the  difference  becoming  more  pronounced  as  the  connectivity  increases. 
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SHARED  MEMORY 

The  same  superposition  principle  can  be  applied  to  memory  cells,  where  many  optical  beams  can 
read  the  same  memory  location  simultaneously.  This  concept  could  be  useful  in  building  a  parallel 
shared  memory  machine. 

For  this  concept,  we  first  consider  abstract  models  of  parallel  computation  based  on  shared 
memories.  The  reason  for  this  approach  is  to  abstract  out  inherent  limitations  of  electronic  technology 
(such  as  limited  interconnection  capability);  in  designing  an  architecture  one  would  adapt  the  abstract 
model  to  the  limitations  of  optical  systems.  These  shared  memory  models  are  basically  a  paralleliza¬ 
tion  of  the  Random  Access  Machine. 

The  Random  Access  Machine  (RAM)  model  [Aho,  Hopcroft,  and  Ullman,  1974]  is  a  model  of 
sequential  computation,  similar  to  but  less  primitive  than  the  Turing  machine.  The  RAM  model  is  a 
one-accumulator  computer  in  which  the  instructions  are  not  allowed  to  modify  themselves.  A  RAM 
consists  of  a  read-only  input  tape,  a  write-only  output  tape,  a  program  and  a  memory.  The  time  on  the 
RAM  is  bounded  above  by  a  polynomial  function  of  time  on  the  TM.  The  program  of  a  RAM  is  not 
stored  in  memory  and  is  unmodifiable.  The  RAM  instruction  set  is  is  small  and  consists  of  operations 
such  as  store,  add,  subtract,  and  jump  if  greater  than  zero;  indirect  addresses  are  permitted.  A  common 
RAM  model  is  the  uniform  cost  one,  which  assumes  that  each  RAM  instruction  requires  one  unit  of 
time  and  each  register  one  unit  of  space. 

Shared  memory  models  are  based  on  global  memories  and  are  differentiated  by  their  accessibility 
to  memory.  In  Fig.  2  we  see  a  typical  shared  memory  model  where  individual  processing  elements 
(PE's)  have  variable  simultaneous  access  to  an  individual  memory  cell.  Each  PE  can  access  any  cell  of 


the  global  memory  in  unit  time.  In  addition,  many  PE’s  can  access  many  different  cells  of  the  global 
memory  simultaneously.  In  the  models  we  discuss,  each  PE  is  a  slightly  modified  RAM  without  the 
input  and  output  tapes,  and  with  a  modified  instruction  set  to  permit  access  to  the  global  memory.  A 
separate  input  for  the  machine  is  provided.  A  given  processor  can  generally  not  access  the  local 
memory  of  other  processors. 


control  (write)  input 


shared 


Fig.  2.  Conceptual  diagram  of  shared 
memory  models. 


Fig.  3.  One  memory  cell  of  an  array, 
showing  multiple  optical  beams  provid¬ 
ing  contention-free  read  access. 


The  various  shared  memory  models  differ  primarily  in  whether  they  allow  simultaneous  reads 
and/or  writes  to  the  scone  memory  celL  The  PRAC,  parallel  random  access  computer  [Lev.  Pippenger 
and  Valiant.  1981]  does  not  allow  simultaneous  reading  or  writing  to  an  individual  memory  cell.  The 
PRAM,  parallel  random  access  machine,  [Fortune  and  Wyllie,  1978]  permits  simultaneous  reads  but 
not  simultaneous  writes  to  an  individual  memory  cell.  The  WRAM,  parallel  write  random  access 
machine,  denotes  a  variety  of  models  that  permit  simultaneous  reads  and  certain  writes,  but  differ  in 
how  the  write  conflicts  are  resolved.  For  example,  a  model  by  Shiloach  and  Vishkin  (1981)  allows  a 
simultaneous  write  only  if  all  processors  are  trying  to  write  the  same  value.  The  paracomputer 
[Schwartz.  1980]  has  simultaneous  writes  but  only  “some”  of  all  the  information  written  to  the  cell  is 
recorded.  The  models  represent  a  hierarchy  of  time  complexity  given  by 

j'PRAC^j'  PRAM WRAM 

where  T  is  the  minimum  number  of  parallel  time  steps  required  to  execute  an  algorithm  on  each  model. 
More  detailed  comparisons  are  dependent  on  the  algorithm  [Borodin  and  Hopcroft,  1985]. 


In  general,  none  of  these  shared  memory  are  physic, illy  realizable  because  of  actual  fan-in  limita¬ 
tions.  As  an  electronic  example,  the  ultracomputer  [Schwartz,  1980]  is  an  architectural  manifestation 
of  the  paracomputer  that  uses  a  hardwired  Omega  network  between  the  PE's  and  memories;  it  simu¬ 
lates  the  paracomputer  within  a  time  penalty  of  O  (log2n ).  The  current  IBM  RP3  project  is  a  continua¬ 
tion  of  the  (initial)  work  on  the  ultracomputer. 

Optical  systems  could  in  principle  be  used  to  implement  this  parallel  memory  read  capability.  As 
a  simple  example,  a  single  l-bit  memory  cell  can  be  represented  by  one  pixel  of  a  1-D  or  2-D  array; 
the  bit  could  be  represented  by  the  state  (opaque  or  transparent)  of  the  memory  cell.  Many  optical 
beams  can  simultaneously  read  the  contents  of  this  memory  cell  without  contention  (Fig.  3).  In  addi¬ 
tion  to  this  an  interconnection  network  is  needed  between  the  PE’s  and  the  memory,  that  can  allow  any 
PE  to  communicate  with  any  memory  cell,  preferably  in  one  step,  and  with  no  contention.  A  regular 
crossbar  is  not  sufficient  for  this  because  fan-in  to  a  given  memory  cell  must  be  allowed.  Figure  4 
shows  a  conceptual  block  diagram  of  a  system  based  on  the  PRAM  model;  here  the  memory  array 
operates  in  reflection  instead  of  transmissioa  The  fan-in  required  of  the  interconnection  network  is 
also  depicted  in  the  figure. 


PE*  INTERCONNECTION  MEMORY 

NETWORK  (dynamic )  ARRAY 


Fig.  4.  Block  diagram  of  an  optical  architecture  based  on  parallel  RAM  models. 


MASK 


Fig.  5.  Example  of  an  optical  crossbar  interconnection  network. 

Optical  systems  can  potentially  implement  crossbars  that  also  allow  this  fan-in.  Several  optical 
crossbar  designs  discussed  in  (Sawchuk,  et  a!.,  1986]  exhibit  fan-in  capability.  An  example  is  the 


optical  crossbar  shown  schematically  in  Fig.  S;  it  is  based  on  earlier  work  on  optical  matrix-vector 
multipliers.  The  1-D  array  on  the  left  could  be  optical  sources  (LED’s  or  laser  diodes)  or  just  the  loca¬ 
tion  of  optical  signals  entering  from  previous  components.  An  opdcal  system  spreads  the  light  from 
each  input  source  into  a  vertical  column  that  illuminates  the  crossbar  mask.  Following  the  crossbar 
mask,  a  set  of  optics  collects  the  light  transmitted  by  each  row  of  the  mask  onto  one  element  of  the 
output  array.  The  states  of  the  pixels  in  the  crossbar  mask  (transparent  or  opaque)  determine  the  state 
of  the  crossbar  switch.  Multiple  transparent  pixels  in  a  column  provide  fanout;  multiple  transparent 
pixels  in  a  row  provide  fan-in.  Many  optical  rcconfigurable  network  designs  are  possible,  and  provide 
tradeoffs  in  performance  parameters  such  as  bandwidth,  reconfiguration  time,  maximum  number  of 
lines,  hardware  requirements,  etc.  Unfortunately,  most  simple  optical  crossbars  will  be  limited  in  size 
to  approximately  256  x  256  (Sawchuk.  et  aL,  1986).  We  are  currently  considering  variants  of  this 
technique  to  increase  the  number  of  elements.  Possibilities  include  using  a  multistage  but  nonblocking 
interconnection  network  (e.g.  Gos),  a  hierarchy  of  crossbars,  and/or  a  memory  hierarchy. 

GATES 

Since  the  superposition  property  of  optics  only  applies  in  linear  media,  it  cannot  in  general  be 
used  for  gates,  which  of  course  are  inherently  nonlinear.  However,  for  important  special  cases  super¬ 
position  can  allow  many  optical  gates  to  be  replaced  with  one  optical  switch. 

Consider  again  the  situation  depicted  in  Fig.  3,  with  the  aperture  being  used  as  a  switch  or  relay. 
The  control  beam  opens  or  closes  the  relay;  when  the  relay  is  closed  (i.e.,  aperture  is  transparent), 
many  optical  signal  beams  can  independently  pass  through  the  relay.  If  b  represents  the  control  beam 
and  a,  the  signal  beams,  this  in  effect  computes  b  a,  or  b  a, ,  depending  on  which  state  of  b  closes  the 
relay,  where  •  denotes  the  AND  operation  (Fig.  6). 
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Fig.  6.  One  optical  relay  or  superimposed  gate  versus  individual  gates 
with  a  common  input 

Using  this  concept  a  set  of  gates  with  a  common  input  in  a  single-instruction  multiple-data 
(SIMD)  machine  can  be  replaced  with  one  optical  switch  or  "superimposed  gate”.  An  example  of 
this  is  in  the  control  signals;  instead  of  broadcasting  each  instruction  or  control  bit  to  all  PE’s,  a  fan-in 
from  all  PE’s  to  a  common  control  switch  is  performed.  Thus,  for  /  control  bits  per  instruction  word, 
/  superimposed  gates  could  replace  N1  gates  (/  per  PE).  Since  for  optical  or  hybrid  systems  we 
expert  N->! ,  this  can  be  a  substantial  reduction.  Fig.  7  shows  an  example  of  how  this  can  be  incor¬ 
porated  into  fixed  optical  interconnections  (such  as  those  of  Fig.  1).  In  the  figure  there  are  four  PE’s 
laid  out  on  a  2-D  array  of  gates.  Each  PE  sends  a  signal  through  one  pixel  of  a  transmissive  spatial 
light  modulator  (SLM).  The  SLM  is  electrically  addressed,  so  that  the  instructions  can  come  from  an 


Fig.  7.  An  optical  architecture  for  the  incorporation  of  superimposed  gates  for  instruction  or  control 
bits.  The  optics  are  omitted  for  clarity  but  are  identical  to  those  of  Fig.  1.  Signals  horn  four  gates  are 
shown  that  fan  in  to  a  common  control  bit  «  . 


electronic  host  After  passing  through  a  common  superimposed  gate  corresponding  to  the  control  bit, 
the  signals  proceed  to  the  appropriate  gate  inputs  in  the  gate  input  array.  In  this  case  the  second  holo¬ 
gram  H 2  deflects  the  signals  to  the  desired  gate  inputs  (gates  different  from  which  they  came).  This 
optical  system  is  identical  to  that  of  Fig.  1  except  for  the  introduction  of  the  SLM  for  control  bits;  thus 
the  systems  are  compatable.  Note  also  that  the  fanout  of  each  gate  in  this  process  is  one;  a  conven¬ 
tional  implementation  with  a  large  number  of  PE’s  would  require  very  high  fanout  capability  or  else  a 
tree  of  gates  for  each  control  bit  to  provide  the  fanout 

These  superimposed  gates  are  not  true  3-terminal  devices.  The  common  (p)  input  is  regenerated, 
but  the  a,  inputs  are  not  As  a  result  a  design  constraint  that  these  a,  signals  do  not  go  through  too 
many  superimposed  gates  in  succession  without  being  regenerated  by  a  conventional  gate,  must  be 
adhered  to.  This  is  typically  not  an  issue  in  the  case  of  control  bits.  Another  consequence  is  that  the 
total  switching  energy  required  for  a  given  processing  operation  is  reduced,  because  N  gates  are 
replaced  with  one  superimposed  gate.  This  is  important  because  it  is  likely  that  the  total  switching 
energy  will  ultimately  be  the  limiting  factor  on  the  switching  speed  and  number  of  gates  in  an  optical 
computer.  Other  advantages  include  an  increase  in  computing  speed  since  some  of  the  gates  are 
effectively  passive  and  reduced  requirements  on  the  device  used  to  implement  the  optical  gates. 

CONCLUSIONS 

We  have  shown  that  the  property  of  superposition  can  be  exploited  in  the  design  of  optical  or 
hybrid  optical/electronic  computing  architectures.  It  can  reduce  the  hologram  complexity  for  highly 
parallel  interconnections,  reduce  the  number  of  gates  in  a  SIMD  system,  and  permit  simultaneous 
memory  access  in  a  parallel  shared  memory  machine,  thereby  reducing  contention  problems.  Our  fun¬ 
damental  reason  for  studying  this  is  that  architectures  for  optical  computing  must  be  designed  for  the 
capabilities  and  limitations  of  optics;  they  must  not  be  constrained  by  the  limitations  of  electronic 
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systems,  which  have  necessarily  dominated  approaches  to  digital  parallel  computing  architectures  to 
date. 
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Abstract 

In  this  paper  we  describe  Neural  Network  based  algorithms 
for  the  segmentation  of  textured  gray  level  images.  We  for¬ 
mulate  the  problem  as  one  of  minimizing  an  energy  function, 
derived  through  the  representation  of  textures  as  Markov  Ran¬ 
dom  Fields  (MRF).  The  texture  intensity  array  is  modelled  as 
a  Gauss  Markov  Random  Field  (GMRF)  and  an  Ising  model  is 
used  to  characterize  the  label  distribution.  The  resulting  non- 
convex  energy  function  is  minimized  using  a  Hopfield  neural 
network.  The  solution  obtained  is  a  local  optimum  in  general 
and  may  not  be  satisfactory  in  many  cases.  Although  stochastic 
algorithms  like  simulated  annealing  have  a  potential  of  finding 
the  global  optimum,  they  are  computationally  expensive.  We 
suggest  an  alternate  approach  based  on  the  theory  of  learning 
automata  which  introduces  stochastic  learning  into  the  itera¬ 
tions  of  the  Hopfield  network.  A  probability  distribution  over 
the  possible  label  configurations  is  defined  and  the  probabilities 
are  updated  depending  on  the  final  stable  states  reached  by  the 
neural  network.  This  iterated  hill  climbing  algorithm  combines 
the  fast  convergence  of  the  deterministic  relaxation  with  the 
sustained  exploration  of  the  stochastic  algorithms.  The  perfor¬ 
mance  of  this  rule  in  classifying  some  real  textured  images  is 
given. 

1  Introduction 

Neural  networks  are  receiving  increasing  attention  for  solving 
computationally  hard  optimization  problems  in  computer  vi¬ 
sion.  Their  inherent  parallelism  provides  an  interesting  ar¬ 
chitecture  for  implementing  many  of  these  algorithms.  Few 
examples  include  image  restoration  [l],  stereopsis  [2]  and  com¬ 
puting  optical  flow  [3].  The  standard  Hopfield  type  networks 
are  designed  to  minimize  certain  energy  functions  and  it  can 
be  shown  that  [4]  for  networks  having  symmetric  interconnec¬ 
tions,  the  equilibrium  states  correspond  to  the  local  minima  of 
the  energy  function.  For  practical  purposes,  networks  with  few 
interconnections  are  preferred  because  of  the  large  number  of 
processing  units  required  in  any  image  processing  application. 
In  this  context  Markov  Random  Field  (MRF)  models  for  im¬ 
ages  play  an  useful  role.  They  are  typically  characterized  by 
local  dependencies  and  symmetric  interconnections  which  can 
be  expressed  in  terms  of  energy  functions  using  Gibbs- Markov 
equivalence  [5]. 


Texture  Segmentation  and  classification  is  an  important 
problem  in  computer  vision.  Most  of  the  real  world  objects 
consist  of  textured  surfaces.  One  can  segment  images  based 
on  textures  even  if  there  are  no  apparent  intensity  edges  be¬ 
tween  the  different  segments.  Depending  on  the  nature  of  the 
statistics  obtained  from  the  image  data  different  segmentation 
methods  are  possible.  In  this  paper  we  use  prior  models  for 
the  conditional  intensity  distribution  and  the  texture  class  dis¬ 
tribution  to  segment  and  classify  images  consisting  of  different 
textures.  The  conditional  distribution  of  the  pixel  intensities 
given  the  labels  is  modelled  as  a  fourth  order  Gauss  Markov 
Random  Field  (GMRF).  The  distribution  of  the  labels  in  the 
image  is  characterized  by  an  Ising  model.  The  segmentation 
can  then  be  formulated  as  an  optimization  problem  involving 
minimization  of  a  Gibbs  energy  function.  Finding  an  opti¬ 
mal  solution  to  this  problem  requires  an  exhaustive  search  over 
possible  label  configurations  which  is  practically  impossible.  It 
is  well  known  that  stochastic  relaxation  algorithms  like  simu¬ 
lated  annealing  [5]  can  find  the  global  optimum  if  proper  cool¬ 
ing  schedules  are  followed  and  (6]  describes  some  segmentation 
algorithms  based  on  this  apprdach.  Deterministic  relaxation 
schemes  provide  very  fast  solutions  but  most  of  the  time  they 
get  trapped  "In  the  local  minima.  We  first  describe  a  neural 
network  algorithm  for  carrying  out  the  minimization  process 
based  on  deterministic  relaxation.  It  is  observed  that  models 
based  on  GMRF  can  be  easily  mapped  on  to  neural  networks. 
The  solutions  obtained  using  this  method  are  sensitive  to  the 
initial  configuration  and  in  many  cases  starting  with  a  Maxi¬ 
mum  likelihood  estimate  is  preferred.  Stochastic  learning  can 
be  easily  introduced  in  to  the  above  network  and  the  overall 
system  improves  the  performance  by  learning  while  searching. 
The  learning  algorithms  used  are  derived  from  the  theory  of 
stochastic  learning  automata  and  we  believe  that  this  is  the 
first  time  such  a  hybrid  system  has  been  used  in  an  optimiza¬ 
tion  problem.  The  stochastic  nature  of  the  system  helps  in  pre¬ 
venting  the  algorithm  from  being  trapped  in  a  local  minimum 
and  we  observe  that  this  improves  the  quality  of  the  solutions 
obtained. 

The  organization  of  this  report  is  as  follows  :  In  section  2 
the  image  model  is  discussed  .  Section  3  describes  the  Hop- 
field  Neural  network  and  section  4  provides  a  brief  review  of 
the  stochastic  learning  automata  and  its  application  to  texture 
segmentation.  Experimental  results  are  given  in  section  5. 


*  Partially  supported  by  the  AFSOR  grant  no  86-0196. 


2  Image  Model 

We  use  a  fourth  order  GMRF  to  represent  the  conditional  prob¬ 
ability  density  of  the  image  intensity  array  given  its  texture  la¬ 
bels.  The  texture  labels  are  assumed  to  obey  a  first  or  second 
order  Ising  Model  with  a  single  parameter  /3,  which  measures 
the  amount  of  cluster  between  adjacent  pixels. 

Let  ft  denote  the  set  of  grid  points  in  the  M  x  Ai  lattice,  i. 
e.,  ft  as  {(»,  j) ,  1  <  i,j  <  M).  Following  Geman  and  Graffigne 
[7]  we  construct  a  composite  model  which  accounts  for  texture 
labels  and  gray  levels.  Let  (I,  ,  s  6  ft}  and  {Y,  ,  s  £  ft  } 
denote  the  labels  and  zero  mean  gray  level  arrays  respectively. 
The  zero  mean  array  is  obtained  by  subtracting  the  local  mean 
computed  from  a  small  window  centered  at  each  pixel.  Let  ff, 
denote  the  symmetric  fourth  order  neighborhood  of  a  site  s. 
Then  we  can  write  the  following  expression  for  the  conditional 
density  of  the  intensity  at  the  pixel  site  s: 


P(Y.  =  y.\Yr  =  yr, re  N„ L,  =  l)  = 


where  Z{1)  is  the  partition  function  of  the  conditional  Gibbs 
distribution,  r  6  N,  and 

U(Y.  =  y.\Yr  =  y„reN.,L,  =  l)  = 

A(lfJ  -  2  £  (1) 


^y;iL.=D  =  ^^h-2  £  ew) 

1  '•€»*'.  1  r6/V|r+r€H',  ) 

(2) 

N'  is  the  set  of  shift  vectors  corresponding  to  a  fourth  order 
neighborhood  system: 

N‘  =  {Tti  Ti>  r3.  •  •  •,  r"i0} 

=  {(0,1),(1,0),(1, 1),(  — 1,1),  (0,2),  (2,0),  (1,2),  (2,1), 

(-1.2), (-2,1)} 

The  label  array  is  modelled  as  a  first  or  second  order  Ising 
distribution.  If  N,  denotes  the  appropriate  neighborhood  for 
the  Ising  model,  then  we  can  write  the  distribution  function 
for  the  texture  label  at  site  s  conditioned  on  the  labels  of  the 
neighboring  sites  as: 


P(L.\Lr  ,  r  6  N.)  = 


e-U,{L.  |  U) 


where  Zj  is  a  normalizing  constant  and 

Ui(L.  I  Lr,  r  6  Hr.)  =  -0  £  S(L,  -Lr),0>  0  (3) 

re/0. 

In  (3),  /)  determines  the  degree  of  clustering,  and  S(i  -  j ) 
is  the  Kronecker  delta.  Using  the  Bayes  rule,  we  can  write 


?(L.|Y:,£r,relV.)  = 


P(Y;|£.)P(£.|£t) 


In  (1),  <Tt  and  0‘  are  the  GMRF  model  parameters  of  the  j 
1-th  texture  class.  The  model  parameters  satisfy  9^,  =  0' =  • 

e'._,  =  e'T. 

The  Gibbs  energy  function  computed  in  (1)  should' be  used  j 
in  the  classification  process.  However  seldom  do  the  texture  1 
features  tend  to  be  so  small  as  to  be  captured  by  a  fourth  or¬ 
der  neighborhood.  Increasing  the  order  of  the  GMRF  model 
requires  the  estimation  of  additional  model  parameters  which 
are  quite  sensitive.  An  alternative  approach  is  to  calculate 
the  joint  distribution  of  the  intensity  conditioned  on  the  tex¬ 
ture  label  in  a  small  window  centered  at  the  pixel  site.  The 
corresponding  Gibbs  energy  can  then  be  used  in  the  relaxation 
process  for  segmentation.  We  view  the  image  intensity  array 
as  composed  of  a  set  of  overlapping  k  x  k  windows  W„  cen¬ 
tered  at  each  pixel  s  6  ft-  In  each  of  these  windows  we  assume 
that  the  texture  label  L,  is  homogeneous  (all  the  pixels  in  the 
window  belong  to  the  same  texture).  As  before  we  model  the 
intensity  in  the  window  by  a  fourth  order  stationary  GMRF. 
The  local  mean  is  computed  by  taking  the  average  of  the  in¬ 
tensities  in  the  window  W,  and  is  subtracted  from  the  original 
image  to  get  the  zero  mean  image.  All  our  references  to  the 
intensity  array  corresponds  to  the  zero  mean  image.  Let  YJ 
denote  the  2-D  vector  representing  the  intensity  array  in  the 
window  W,.  Using  the  Gibbs  formulation  and  assuming  a  free 
boundary  model,  the  joint  probability  density  in  the  window 
W,  can  be  written  as. 


P(Y]\L,  =  l)  = 


e-«,(Y;|L.=l) 


where  Z\(l)  is  the  partition  function  and 


Since  Y*  is  known,  the  denominator  in  (4)  is  just  a  normal¬ 
izing  factor.  The  numerator  is  a  product  of  two  exponential 
functions  and  can  be  expressed  as. 


P(L.  |  Y;,  Lr,  r6JV.)=I  e-W-  '  Y*'-  M  (5) 

where  Zp  is  a  normalizing  factor  and  UT(.)  is  the  posterior 
energy  corresponding  to  (5).  From,  (1)  and  (2)  we  can  write 


Ur(L,\Y-„  Lr,  r  €  N,)  -  w(L,)  +  U,(Y;|£.)  +  Ut(L.\Lr)  (6) 

Note  that  the  second  term  in  (6)  relates  the  observed  pixel 
intensities  to  the  texture  labels  and  the  last  term  specifies  the 
label  distribution.  The  bias  term  ic(£,)  =  log Z\(L,)  is  depen¬ 
dent  on  the  texture  class  and  it  can  be  explicitly  evaluated  for 
the  GMRF  model  considered  here  using  the  toroidal  boundary 
assumption.  However  the  computations  become  very  cumber¬ 
some  if  toroidal  assumptions  are  not  made.  An  alternate  ap¬ 
proach  is  to  estimate  the  bias  from  the  histogram  of  the  data  as 
suggested  by  Geman  and  Graffigne  (7j.  Finally,  the  posterior 
distribution  of  the  texture  labels  for  the  entire  image  given  the 
intensity  array  is 


P( L  I  Y-)  = 


P( Y*  |  L)  P(L) 


Maximizing(7)gives  the  optimal  Bayesian  estimate.  Though 
it  is  possible  in  principle  to  compute  the  righthand  side  of  (7) 
and  find  the  global  optimum,  the  computational  burden  in¬ 
volved  is  so  enormous  that  it  is  practically  impossible  to  do 
so.  However  we  note  that  the  stochastic  relaxation  algorithms 
like  simulated  annealing  require  only  the  computation  of  (5)  to 


obtain  the  optimal  solution.  The  network  relaxation  algorithm 
in  section  3  also  uses  these  values,  but  is  guaranteed  to  find 
only  the  local  minima. 

3  A  Neural  Network  for  Texture  Classi¬ 
fication 

In  this  section  we  consider  a  deterministic  relaxation  algorithm 
based  on  the  image  model  described  above.  This  algorithm  can 
be  implemented  in  a  highly  parallel  fashion  on  a  neural  network 
architecture  and  typically  converges  in  20-30  iterations. 

We  begin  by  describing  a  network  for  the  segmentation 
problem  and  the  energy  function  it  minimizes.  This  energy 
function  is  obtained  from  the  image  model  described  in  (2).  For 
convenience  of  notation  let  Ut(i,  j,l)  =  U\(Y],  L,  =  t)  +  w(l) 
where  s  =  ( i,j )  denotes  a  pixel  site  and  U\{  .  )  and  w(l)  are 
as  defined  in  (6).  The  network  consists  of  K  layers,  each  layer 
arranged  as  an  M  x  Af  array,  where  K  is  the  number  of  texture 
classes  in  the  image  and  M  is  the  dimension  of  the  image.  The 
elements  (neurons)  in  the  network  are  assumed  to  be  binary 
and  are  indexed  by  where  (i,j)  =  s  refers  to  their 

position  in  the  image  and  /  refers  to  the  layer.  The  (t,  j,l)-th 
neuron  is  said  to  be  ON  if  its  output  is  1,  indicating  that 
the  corresponding  site  s  =  (j,  j)  in  the  image  has  the  texture 
label  /.  Let  Tijt&yp  be  the  connection  strength  between  the 
neurons  and  («',/,!')  and  /,/(  be  the  input  bias  current. 

Then  a  general  form  for  the  energy  of  the  network  is  [4] 

.  M  K  M  K  ,  M  K  : 

E  -  52  52  52  ~  o  52  52  \ 

4  («t  i'J'ml  l'*  1  ijMl  (=1 

(8) 

From  our  discussion  in  section  2  we  note  that  an  approxi¬ 
mate  solution  for  the  MAP  estimate  can  be  obtained  by  mini- ! 
mizing  (6)  for  each  site  in  the  image.  It  is  easy  to  see  that  this 
is  equivalent  to  minimizing  the  following  energy  function  for 
the  network: 

.  K  a  K  M  M 

E  =  \  52  22 U\(i,i>i)Vijt -f52£52  52  vmvm 

*=(*j)  '=>  '=>  •=>  >=•  (i'j'ie/v,, 

(?) 

where  JV,y  is  the  neighborhood  of  site  (t,  j)  (same  as  the  N, 
in  section  2).  In  (9),  it  is  implicitly  assumed  that  each  pixel  site 
has  a  unique  label,  i.e.  only  one  neuron  is  active  in  each  column 
of  the  network.  This  constraint  can  be  implemented  in  different 
ways.  A  simple  method  is  to  use  a  winner-taka-all  circuit  for 
each  column  so  that  the  neuron  receiving  the  maximum  input  is 
turned  on  and  the  others  are  turned  off.  Alternately  a  penalty 
term  can  be  introduced  in  (9)  to  represent  the  constraint  as  in 
[4].  From  (8)  and  (9)  we  can  identify  the  parameters  for  the 
network, 


=  j  g 


and  the  bias  current 


if  («'.;')€  tV.y.Vl 
otherwise 


(10) 


I„,  =  -  l/i  (i,j,l) 


The  input-output  relation  can  be  stated  as  follows:  Let  utJi 
be  the  potential  of  neuron  ( t,  j, /).  (  Note:/  is  the  layer  number 


corresponding  to  texture  class  l)  ,  then 


and 


“.,i  =  52  £  52  w  *  +  !i>i 

i'al  j'al  I'sl 


if  a,,/  =  min 
otherwise 


(ID 


(12) 


In  (10)  we  have  no  self  feedback  ,i.e.  =  0,¥i,j, / 

and  all  the  connections  have  equal  strengths.  The  updating 
scheme  ensures  that  at  each  stage  the  energy  decreases.  Since 
the  energy  is  bounded,  the  convergence  of  the  above  system  is 
assured  but  the  stable  state  will  in  general  be  a  local  optimum. 

This  neural  model  is  one  version  of  the  Iterated  Conditional 
Mode  algorithm  (ICM)  of  Besag  (8).  This  algorithm  maximizes 
the  conditional  probability  p (L.  =  f|Y^,L^,s'  €  N.)  during 
each  iteration  .  ICM  is  a  local  deterministic  relaxation  algo¬ 
rithm  and  very  easy  to  implement.  We  observe  that  in  general 
many  algorithms  based  on  MRF  models  can  be  easily  mapped 
on  to  Neural  networks  with  local  interconnections. 


4  Stochastic  Learning  Algorithms 

* 

We  begin  with  a  brief  introduction  to  the  Stochastic  Learn¬ 
ing  Automaton  (SLA).  A  SLA  is  a  decision  maker  operating 
in  a  random  environment.  A  stochastic  automaton  can  be  de¬ 
fined  by  a  quadruple  ( a,Q,T,R )  where  a  =  {al,...,a/v}  is 
the  set  of  available  actions  to  the  automaton.  The  action  se¬ 
lected  at  time  t  is  denoted  by  a(t).  Q(t)  is  the  state  of  the 
automaton  at  time  t  and  consists  of  the  action  probability  vec¬ 
tor  p(t)  =  |j>i(t),-..,P<v(t)]  ^'re  p,(t)  =  prob  (o(t)  =  a,) 
and  £,•  pi(t )  =  1  Vt.  The  environment  responds  to  the  action 
a(t)  with  a  A(t)  6  R,  R  being  the  set  of  environment’s  re¬ 
sponses.  The  state  transitions  of  the  automaton  are  governed 
by  the  learning  algorithm  T,  Q(t  +  1)  =  T(Q(t),a(t),  A(t)). 
Without  loss  of  generality  it  cih  be  assumed  that  R  =  (0, 1], 
i.e.,  the  responses  are  normalized  to  lie  in  the  interval  [0,1],  ‘1’ 
indicating  a' complete  success  and  ‘0’  total  failure.  The  goal 
of  the  automaton  is  to  converge  to  the  optimal  action,  i.e.  the 
action  which  results  in  the  maximum  expected  reward.  Again 
without  loss  of  generality  let  aj  be  the  optimal  action  and 
di  =  £[A(t)  |  Qj]  =  max,{£[A(t)  |  a,]}.  At  present  no  learning 
algorithms  exist  which  is  optimal  in  the  above  sense.  However 
we  can  choose  the  parameters  of  certain  learning  algorithms 
so  as  to  realize  a  response  as  close  to  the  optimum  as  desired. 
This  condition  is  called  c-optimality.  If  Af(t)  =  £[A(t)  |  p(t)], 
then  a  learning  algorithm  is  said  to  be  (-optimal  if  it  results  in 
a  Af(t)  such  that 


Urn  £[M(1)]  >  d,  -  (  (13) 

for  a  suitable  choice  of  parameters  and  for  any  c  >  0.  One 
of  the  simplest  learning  schemes  is  the  Linear  Reward-Inaction 
rule  ,  Ln-i  ■  Suppose  at  time  t  we  have  a(t)  =  o,  and  if  A(() 
is  the  response  received  then  according  to  the  Lr-/  rule, 


p,(t  +  l)  =  p,(t)  +  a  A(t)  (1  -  pi(t)] 
p;(t  +  l)  =  p,(/)[l  -  a  A(()  p;(0] 

V;/.  (14) 


where  a  is  a  parameter  of  the  algorithm  controlling  the 
learning  rate.  Typical  values  for  a  are  in  the  range  0.01-0.1. 
It  can  be  shown  that  this  Lr-i  rule  is  r  —  optimal  in  all  sta¬ 
tionary  environments  i.e.,  there  exists  a  value  for  the  parameter 
a  so  that  condition  (13)  is  satisfied. 

Collective  behavior  of  a  group  of  automata  has  also  been 
studied.  Consider  a  team  of  N  automata  A;(i  =  1, jV)  each 
having  r,  actions  a‘  =  {a\...or).}.  At  any  instant  l  each 
member  of  the  team  makes  a  decision  cr'(t).  The  environment 
responds  to  this  by  sending  reinforcement  signal  A(t)  to  all 
the  automata  in  the  group.  This  situation  represents  a  co¬ 
operative  game  among  a  team  of  automata  with  identical  pay¬ 
off.  All  the  automata  update  their  action  probability  vectors 
according  to  (3)  using  the  same  learning  rate  and  the  process 
repeats  .  Local  convergence  results  can  be  obtained  in  case  of 
stationary  random  environments.  Variations  of  this  rule  have 
been  applied  to  complex  problems  like  decentralized  control  of 
Markov  Chains  [9]  and  relaxation  labelling  [10] 

The  texture  classification  discussed  in  the  previous  sections 
can  be  treated  as  a  relaxation  labelling  problem  and  stochastic 
automata  can  be  used  to  learn  the  labels  (texture  class)  for  the 
pixels.  A  Learning  Automaton  is  assigned  to  each  of  the  pixel 
sites  in  the  image.  The  actions  of  the  automata  correspond  to 
selecting  a  label  for  the  pixel  site  to  which  it  is  assigned.  Thus 
each  automaton  has  K  actions  and  a  probability  distribution 
over  this  action  set.  Initially  the  labels  are  assigned  randomly 
with  equal  probability.  Since  the  nomberof  automata  involved 
is  very  large,  it  is  not  practicable  *.o  update  the  action  probabil¬ 
ity  vector  at  each  iteration.  Instead  we  combine  the  iterations 
of  the  neural  network  described  in  the  previous  section  with  the  I 
stochastic  learning  algorithm.  This  results  in  an  iterative  hill 
climbing  type  algorithm  which  combines  the  fast  convergence 
of  deterministic  relaxation  with  the  sustained  exploration  of 
the  stochastic  algorithm.  The  stochastic  part  prevents  the  ai-  ! 
gorithm  from  getting  stuck  in  a  local  minima  and  at  the  same 
time  “learns”  from  the  search  by  updating  the  state  probabili¬ 
ties.  However  unlike  simulated  annealing,  we  cannot  guarantee 
convergence  to  the  global  optimum.  Each  cycle  now  has  two 
phases:  The  first  phase  consists  of  the  deterministic  relaxation 
network  converging  to  a  solution.  The  second  phase  consists  of 
the  learning  network  updating  its  state  .the  new  state  being  de¬ 
termined  by  the  equilibrium  state  of  the  relaxation  network.  A 
new  initial  state  is  generated  by  the  learning  network  depend¬ 
ing  on  its  current  state  and  the  cycle  repeats.  Thus  relaxation 
and  learning  alternate  with  each  other.  After  each  iteration  the 
probability  of  the  more  stable  states  increases  and  because  of 
the  stochastic  nature  of  the  algorithm  the  possibility  of  getting 
trapped  in  a  bad  local  minima  is  reduced.  The  algorithm  is 
summarized  below  : 

4.1  Learning  Algorithm 

Let  the  pixel  site  be  denoted  (as  in  section  2)  by  s  6  fl  and 
the  number  of  texture  classes  be  L.  Let  A,  be  the  automa¬ 
ton  assigned  to  site  s  and  the  action  probability  vector  of  A, 
be  P,(0  =  lp»,i(0t •  •  •  iP«,t(0l  lnd  53, Pi,i(t)  =  lVs,t,  where 
P«,l(0  =  prob(label  of  site  s  =  /).  The  steps  in  the  algorithm 
are: 

1.  Initialize  the  action  probability  vectors  of  all  the  automata: 

P».t(0)  =  1/A'.  Vs,/ 


Initialize  the  iteration  counter  to  0. 

2.  Choose  an  initial  label  configuration  sampled  from  the 
distribution  of  these  probability  vectors. 

3.  Start  the  neural  network  of  section  3  with  this  configura¬ 
tion. 

4.  Let  /,  denote  the  label  for  site  s  at  equilibrium.  Let  the 
current  time  (iteration  number)  be  t.  Then  the  action 
probabilities  are  updated  as  follows: 

p.x(‘  +  l)  =  JM.(0  +  a  A(t)  [1  -  p,j.(t)] 

P*j(t  +  1)  =  P.j(0U  ~  «  •HOPi(O) 

Vs  and  Vj  /,  (15) 

The  response  X(t)  is  derived  as  follows:  Suppose  the 
present  label  configuration  resulted  in  a  lower  energy  state 
compared  to  the  previous  one  then  it  results  in  a  A (l)  = 
Aj  and  if  the  energy  increases  we  have  A(t)  =  Xj  with 
At  >  Aj.  In  our  simulations  we  have  used  A,  =  1  and 
Xj  =  0.25. 

5.  Generate  rf  new  configuration  from  this  updated  label 
probabilities,  increment  the  iteration  counter  and  goto 
step  3. 

Thus  the  system  consists  of  two  layers,  one  for  relaxation 
and  the  other  for  learning.  The  relaxation  network  is  similar 
to  the  one  considered  in  section  3,  the  only  difference  is  that 
the  initial  state  is  decided  by  the  learning  network.  The  learn¬ 
ing  network  consists  of  a  team  of  automata  and  learning  takes 
place  at  a  much  lower  speed  than  relaxation  with  fewer  number 
of  updating.  The  probabilities  of  the  labels  corresponding  to 
the  final  state  of  the  relaxation  network  are  increased  accord¬ 
ing  to  (15).  Using  these  new  probabilities  a  new  configuration 
is  generated.  Since  the  response* does  not  depend  on  time,  this 
corresponds  to  a  stationary  environment  and  as  we  have  noted 
before  this  bn-j  algorithm  can  be  shown  to  converge  to  a  sta¬ 
tionary  point,  not  necessarily  the  global  optimum. 

5  Experimental  Results 

The  algorithms  are  tested  on  real  textures  consisting  of  wood, 
wool,  calf  skin,  pig  skin,  sand  and  grass  .  Previously  computed 
texture  parameters  are  used  in  the  experiment.  The  energy 
functions  are  obtained  by  constructing  a  11  x  11  window  around 
each  of  the  pixel  sites  and  computing  the  Gibb’s  measure  corre¬ 
sponding  to  the  joint  distribution  in  the  window.  It  is  assumed 
that  the  texture  is  homogeneous  within  the  window.  The  bias 
values  for  the  various  textures  i »(/*)  are  chosen  by  trial  and 
error.  These  values  depend  on  the  different  textures  present  in 
the  image  and  a  discussion  regarding  their  estimation  can  be 
found  in  [6].  The  resulting  segmentation  is  sensitive  to  the  bias 
weights,  particularly  for  the  pigskin  and  sand  textures  as  their 
properties  are  very  similar.  Larger  values  of  /?  in  (3)  favours  , 
more  homogeneous  patches  in  the  segmented  image  and  we  | 
used  values  ranging  between  0.3  and  2.0  in  our  experiments  .  ! 
The  algorithms  are  tested  on  three  images.  The  first  two  are 
two  class  problems  consisting  of  calf  skin  and  grass  textures. 
The  resulting  classification  is  shown  in  figure  (1).  The  results 
for  the  six  class  problem  are  shown  in  figure  (2). 


The  deterministic  relaxation  scheme  of  section  3  usually 
takes  about  20-40  iterations  to  converge.  For  comparison  pur¬ 
poses,  the  percentage  misclassification  for  example  2  was  com¬ 
puted  for  the  various  algorithms.  The  deterministic  relaxation 
resulted  in  an  error  of  about  15%  compared  to  22%  for  the  Max¬ 
imum  likelihood  method.  For  the  learning  algorithm  the  error 
was  8.7%.  Compared  to  this  the  simulated  annealing  algorithm 

[6]  had  a  misclassification  of  about  6.8%.  Simulated  annealing 
is  atleast  10-15  times  computionally  more  expensive  than  the 
deterministic  algorithms.  It  was  also  observed  that  the  de¬ 
terministic  relaxation  algorithm  performs  better  when  started 
with  a  random  configuration  than  with  maximum  likelihood 
estimates.  In  terms  of  the  classification  error  for  example  2, 
this  difference  was  about  1%.  Also  when  all  the  neurons  in  the 
deterministic  relaxation  scheme  were  updated  simultaneously, 
the  system  used  to  converge  to  limit  cycles  .switching  between 
two  nearby  states.  Such  cycles  must  be  identified  when  consid¬ 
ering  parallel  implementation  of  neural  networks  on  computers. 
This  will  not  be  a  problem  in  case  of  the  analog  networks  as  the 
probability  of  such  an  event  happening  in  a  physical  system  is 
zero. 

6  Conclusions 

In  this  paper  we  have  described  learning  and  neural  network 
algorithms  for  texture  classification.  The  deterministic  relax¬ 
ation  (ICM)  algorithm  can  be  easily  mapped  to  a  neural  net¬ 
work  for  minimizing  the  energy  function  and  reasonably  good 
solutions  can  be  obtained  quickly.  Texture  classification  is 
treated  as  a  relaxation  labelling  problem  and  a  learning  sys-  , 
tem  is  developed  to  learn  the  pixel  classes.  Convergence  of  i 
both  these  schemes  to  local  optima  can  be  proved  .  It  is  ob-  i 
served  that  learning  is  a  very  slow  process  and  hence  cannot  . 
be  directly  used  in  the  segmentation  process.  A  combination 
of  learning  and  deterministic  relaxation  seems  to  improve  the 
quality  of  solutions  obtained  and  performs  well  compared  to 
simulated  annealing  in  terms  of  speed. 

The  learning  algorithm  described  in  this  paper  is  very  gen¬ 
eral  and  can  be  applied  in  a  variety  of  situations.  This  model 
might  be  particularly  useful  if  little  is  known  about  the  image 
models  and  a  good  criterion  function  is  available  for  generating 
a  reinforcement  signal.  Currently  we  are  working  on  extend¬ 
ing  this  method  to  hierarchical  segmentation  and  other  image 
processing  applications. 
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An  incoherent  optical  neuron  is  proposed  that  subtracts  inhibitory  inputs  from  excitatory  inputs  optically  by 
utilizing  two  separate  device  responses.  Functionally  it  accommodates  positive  and  negative  weights,  excitatory 
and  inhibitory  inputs,  and  nonnegative  neuron  outputs,  and  it  can  be  used  in  a  variety  of  neural  network  models. 
An  extension  is  given  to  include  bipolar  neuron  outputs  in  the  case  of  fully  connected  networks. 


In  this  Letter  we  propose  a  general  incoherent  optical 
neuron  (ION)  model  that  can  process  excitatory  and 
inhibitory  signals  optically  without  electronic  subtrac¬ 
tion.  Conceptually,  the  inhibitory  signal  represents  a 
negative  signal  with  a  positive  synaptic  weight  or  a 
positive  signal  with  a  negative  synaptic  weight.  The 
ION  can  be  used  in  a  network  in  which  the  neuron 
outputs  are  nonnegative  and  the  synaptic  weights  are 
bipolar,  for  example,  by  connecting  the  interconnec¬ 
tions  with  negative  weights  to  the  inhibitory  neuron 
inputs  and  those  with  positive  weights  to  the  excitato¬ 
ry  inputs.  Our  intent  is  to  show  that  it  is  in  principle 
not  necessary  to  go  to  optoelectronic  devices  solely 
because  of  the  requirement  for  subtraction  capability. 

Techniques  that  have  been  described  to  date  are 
impractical  in  all-optical  implementations  of  most 
neural  networks.  They  utilize  an  intensity  and/or 
weight  bias,  in  some  cases  coupled  with  complemented 
weights  or  inputs.  As  noted  in  Ref.  1,  these  tech¬ 
niques  suffer  from  bias  buildup  and/or  thresholds  that 
must  vary  from  neuron  to  neuron.  A  technique  de¬ 
scribed  in  Ref.  2  eliminates  most  of  these  drawbacks  in 
the  special  case  of  fully  connected  networks. 

The  ION  model  uses  separate  device  responses  for 
inhibitory  and  excitatory  inputs.  This  is  modeled  af¬ 
ter  the  biological  neuron,  which  processes  the  excitato¬ 
ry  and  inhibitory  signals  by  different  mechanisms 
(e.g.,  chemical-selected  receptors  and  ion-selected 
gate  channels).3  The  ION  comprises  two  elements: 
an  inhibitory,  /,  element  and  a  nonlinear  output,  N, 
element.  The  inhibitory  element  provides  inversion 
of  the  sum  of  the  inhibitory  signals;  the  nonlinear 
element  operates  on  the  sum  of  the  excitatory  signals, 
the  inhibitory  element  output,  and  an  optical  bias  to 
produce  the  output  of  the  neuron.  The  inhibitory 
element  is  linear;  the  nonlinear  threshold  of  the  neu¬ 
ron  is  provided  entirely  by  the  nonlinear  output  ele¬ 
ment.  Figures  1(a)  and  1(c)  show  the  characteristic 
curve  of  the  /  and  N  elements,  respectively.  The 
structure  of  the  ION  model  is  illustrated  in  Fig.  1(d). 
The  input-output  relationships  of  the  /  and  N  ele¬ 
ments  are 


I  tn  =  / 

1  out  1  inn 


=  i-i, 


nh» 


(i) 


=  Win™  ~  «]  =  *(/inh  +  'exc  +  'bias  ~  «>.  (2) 

where  /inh  and  Itxc  represent  the  total  inhibitory  and 
excitatory  inputs,  respectively,  /in(/V)  is  the  total  input 
to  the  N  elements,  / bias  is  the  bias  term  for  the  N 
element,  which  can  be  varied  to  change  the  threshold, 
and  a  is  the  offset  of  the  characteristic  curve  of  the  N 
element.  \^)' denotes  the  nonlinear  output  function 
of  the  neuron.  If  we  choose  /bias  to  be  a  -  1,  the  output 
of  the  N  element  is 

/oUtW  =  W.xc-4,h).  (3) 

which  is  the  desired  subtraction.  In  general,  the  I 
element  will  not  be  normalized  [Fig.  1(b)],  in  which 
case  the  offset,  alt  and  the  slope  of  its  response  can  be 
compensated  by  setting  /bias  =  a  -  a\  and  attenuating 
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Fig.  1.  The  ION.  (a)  The  inhibitory  element;  (b)  the  un¬ 
normalized  inhibitory  element;  (c)  the  nonlinear  element; 
(d)  the  structure. 
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Fig.  2.  Characteristic  response  of  a  Hughes  twisted-nemat¬ 
ic  liquid-crystal  light  valve,  a  possible  device  for  the  homoge¬ 
neous  ION  model. 


the  output  of  the  /  element  by  a  factor  of  bja\,  respec¬ 
tively.  The  unnormalized  I  element  must  have  gain 
greater  than  or  equal  to  1  A  nonzero  neuron  thresh¬ 
old  9  can  be  implemented  by  shifting  the  bias  by  the 
same  amount,  so  /bias  -  «  —  ai  —  9  for  the  unnormal¬ 
ized  I  element. 

The  ION  model  can  be  implemented  by  using  sepa¬ 
rate  devices  for  the  I  and  N  elements  as  depicted  in 
Fig.  1  (the  heterogeneous  case)  or  by  using  a  single 
device  with  a  nonmonotonic  response  (Fig.  2)  to  imple¬ 
ment  both  elements  (the  homogeneous  case).  Possi¬ 
ble  devices  for  ION  implementation  include  bistable 
optical  arrays  and  spatial  light  modulators  such  as 
liquid -crystal  light  valves.  A  single  Hughes  liquid- 
crystal  light  valve  could  implement  both  elements. 
The  offset  of  the  device  response  must  satisfy  a  >  na  i 
+  8,  where  n  -  1  for  a  heterogeneous  implementation 
and  rt  *=  2  for  a  homogeneous  implementation. 

A  device  to  realize  the  inhibitory  I  element  will,  of  . 
course,  not  have  a  perfectly  linear  response.  To  assess 
the  robustness  of  this  model  to  nonlinear  I  elements 
and  compensation  techniques  for  large  deviations 
from  an  ideal  response,  we  have  performed  simula¬ 
tions  of  a  network  similar  to  Grossberg’s  competitive 
network4  for  edge  detection.  The  simulated  network 
contains  30  conventional  inner-product  neurons  con¬ 
nected  in  a  ring  structure  with  input  and  lateral  on- 
center-off-surround  connections.  We  model  the  nor¬ 
malized  /  element  response  as  exp[— (x/a)6],  where  a 
and  b  are  parameters  that  determine  the  specific  non¬ 
linear  response.  This  provides  insight  into  the  sensi¬ 
tivity  of  the  ION  model  to  nonlinearities  in  the  I  ele¬ 
ment  response  without  being  overly  specific  to  one 
given  device.  A  suitable  choice  of  a  and  b  does  provide 
a  close  fit  to  the  inversion  region  of  the  normalized 
experimental  characteristic  of  a  liquid-crystal  light 
valve.  By  adjusting  the  parameters  a  and  6,  four  dif¬ 
ferent  nonlinear  inversion  curves  were  simulated  in 
this  network.  In  the  simulation  a  compensating  at¬ 
tenuator  was  used  before  the  /  element  instead  of  after 
it.  The  N  element  response,  which  provides  a  close 
approximation  to  the  normalized  increasing  portion  of 
the  liquid-crystal  light  valve  response  (Fig.  2),  is  mod¬ 
eled  as  1  -  exp(-(x/0.43)1 2].  Figure  3  shows  the  com¬ 
puter-simulated  responses  of  this  network.  Each  re¬ 
solvable  row  of  Fig.  3  represents  a  one-dimensional 


893 

simulation  on  a  distinct  one-dimensional  input.  Thir¬ 
ty  different  binary  inputs  were  each  simulated  at  four 
different  input  signal  levels.  Figure  3(b)  gives  the 
idtal  output,  and  Fig.  3(d)  simulates  a  response  that  is 
close  to  the  experimental  response  of  our  liquid-crys¬ 
tal  light  valve.  Deviation  from  linearity  of  the  /  ele¬ 
ment  is  measured  by  normalized  mean-squared  error, 
which  is  defined  as  /(i>(i)  -  v(i)\-  di//u(i)2di,  where 
u(i)  and  0 (i)  are  the  output  values  of  the  linear  and 
simulated  nonlinear  characteristic  curves,  respective¬ 
ly.  The  input  level  i  ranged  from  0  to  0.7.  Our  liquid- 
crystal  light  valve  characteristic  has  a  normalized 
mean -squared  error  of  50%,  which  does  not  perform 
well.  If  proper  input  attenuation  of  the  I  element  is 
included,  the  network  performs  correctly.  Four  non¬ 
linear  curves  are  simulated,  each  with  optimal  input 
attenuation;  we  find  that  deviations  from  linearity 
that  give  a  normalized  mean-squared  error  of  approxi¬ 
mately  15%  (measured  after  input  attenuation)  can  be 
tolerated.  For  more  extremely  nonlinear  devices,  a 
bias  point  and  limited  region  of  operation  can  be  used. 

The  fan  in  and  fan  out  of  the  ION,  neglecting  inter¬ 
connection  effects  such  as  cross  talk,  can  be  calculated 
as  follows.  (Interconnection  effects  are  important  but 
are  not  peculiar  to  the  ION  model.)  We  assume  bina¬ 
ry  neurons.  As  shown  in  Fig.  1(c),  the  output  of  the 
ith  neuron  caf  be  formulated  as  Ir  +  A /S(M  V„  where  V; 
6  |0,  1|  is  the  output  state  of  neuron  i  and  Ir  is  the 
residual  output  of  element  N.  Let  the  fan  in  and  fan 
out  of  each  neuron  be  /Vin  and  N0 ul,  respectively.  The 
summed  inputs  to  neuron  j  can  be  grouped  into  two 
terms,  a  noise  term  caused  by  residual  outputs  (/r)  of 
the  optical  neurons  and  the  signal  term.  Consider  the 
worst  case,  i.e.,  all  weights  are  close  to  one  and  only  one 
input  is  active.  If  we  assume  that  each  neuron  must 
be  able  to  discriminate  a  change  in  any  one  of  its  input 
lines,  then  the  signal  term  must  at  least  be  greater 
than  the  noise  term.  This  is  a  reasonable  assumption 
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Fig.  3.  Simulation  of  imperfect  /  elements  in  the  one-di¬ 
mensional  on-center-off-surround  competitive  network. 
(S|  is  the  input  attenuation  factor  for  the  I  element;  mse  is 
the  normalized  mean-squared  error  deviation  from  linearity 
of  the  /  element  response.)  (a)  The  network  input,  (b)-(f) 
The  network  outputs  for  (b)  the  linear  l  element;  (c)  a  = 
0.46.  ft  =  2.1,  S,  =  1.0.  mse  =  19%;  (d)a  =  0.25,  b  =  1.2.  S,  - 
2.5,  mse  =  4%;  (e)  a  =  0.33.  b  =  1.5,  5,  =  1.5.  mse  =  14%;  (f 
=  0.1  fi.  b  -  0.9,  ,S|  =  5.0.  mse  =  7%. 
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Fig.  4.  Single-layer  feedback  net  using  a  single  spatial  light 
modulator.  SLM,  to  implement  both  I  and  N  elements. 
B.S.,  beam  splitter;  ND,  neutral-density  Filter. 


for  networks  with  small  fan  in  and  fan  out.  Thus  the 
maximum  fan  in  is 

a/(A0 

iVin<max)  =  — j —  =  extinction  ratio  of  element  N.  (4) 

*  r 


The  fan  out  is  calculated  from  the  /  element,  as 
shown  in  Fig.  1(b).  The  ratio  of  the  maximum  input 
ai  to  the  minimum  input  /s(/v,/iV0Ul(m”)  is  the  fan  in 
Ninlmax\  where  N0 ut(max)  is  the  maximum  fan  out  over 
all  neurons,  thus 


j(N)  ( N ) 

at  max  _  at  (max)  ^  s  at  (max) 

“ '  out  _  z*in  ~  „  iy  in  » 

ai 


(5) 


where  the  approximation  holds  when  the  extinction 
ratio  of  the  N  element  is  large.  For  networks  with 
large  fan  in,  we  assume  instead  that  the  neuron  can 
discriminate  a  change  in  a  constant  fraction  0  of  its 
input  signals.  In  this  case,  there  are  no  such  limita¬ 
tions  on  N in  and  iV0Ut.  Instead,  1//3  is  limited  by  the 
extinction  ratio  of  the  N  element,  and  the  fan  out  is 
still  related  to  the  fan  in  of  the  network  by  relation  (5). 
For  example,  in  many  networks  a  1//3  ranging  from  10 
to  100  may  be  sufficient  for  the  optical  neuron,  while 
the  maximum  fan  in  may  be  103-104.  During  imple¬ 
mentation  of  the  ION  model,  with  many  optical  de¬ 
vices  can  be  varied  with  the  intensity  of  the  read 
beam,  which  effectively  increases  the  gain  and  permits 
a  larger  fan  out. 

As  an  example,  a  conceptual  diagram  of  an  imple¬ 
mentation  of  a  single-layer  feedback  net  is  shown  in 
Fig.  4.  It  utilizes  a  single  two-dimensional  spatial 
light  modulator  for  both  /  and  N  elements.  The  out¬ 
put  of  the  /  element  is  imaged  onto  the  input  of  the  N 
element  after  it  passes  through  a  neutral-density  filter 
as  the  (uniform)  attenuation.  A  uniform  bias  beam  is 
also  input  to  the  N  element.  The  (V-element  output  is 
fed  back  through  an  interconnection  hologram  to  the 


inputs  of  both  I  and  N  elements,  representing  inhibi¬ 
tory  and  excitatory  lateral  connections,  respectively. 

We  now  present  a  variant  of  the  ION  model  that 
incorporates  bipolar  neuron  outputs  in  the  case  of 
fully  connected  networks.  The  operation  of  the  net¬ 
work  is  given  by 

^  =  (6) 


where  Vj  e  [-1,  1]  is  the  output  of  the  ;th  neuron  at 
time  t,  Wij  e  [—1,  1]  is  the  normalized  weight  from 
neuron  j  to  neuron  t,  C',  is  the  output  of  the  ith  neuron 
at  time  t  +  1,  and  N  is  the  number  of  neurons.  A 
special  case  of  this  is  the  bipolar  binary  neuron  used  by 
Amari5  (Vj  6  {— 1,  1|).  In  this  case,  the  nonlinear 
output  function  ^(x)  is  equal  to  1  for  x  >  0,  otherwise  it 
is  -1. 

By  using  a  complementary  offset  scheme,  Eq.  (6) 
can  be  rewritten  as 


\  (1  +  ?,)  =  * 


N 


4 


#-i 


(1  ~  Wt])  (1  -  Vj) 
2  2 

(1  +  Wt])  (1  +  Vj) 
2  2 


(7) 


where  \f/(x)  is  the  nonlinear  output  function  of  the 
neuron.  All  terms  in  parentheses  are  positive  and  can 
be  represented  by  intensities.  The  neuron  input  and 
output  are  in  the  form  (1  +  V,)/ 2,  and  the  I  element  is 
used  to  generate  the  (1  -  Vi)/', 2  term.  A  Hopfield  net6 
is  identical  except  for  the  neuron  outputs  F,  e  {0,  1|; 
for  this  we  can  replace  (1  +  Vi)/ 2  with  V,  and  ( 1  —  V/)/2 
with  Vi  in  Eq.  (7),  where  V,  is  the  complement  of  V, 
and  is  generated  by  the  I  element. 


Most  of  this  research  was  presented  at  the  1987 
annual  meeting  of  the  Optical  Society  of  America.7 
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Implementation  of  a  Subtracting  Incoherent  Optical 

Neuron 
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Abstract 

The  Incoherent  Optical  Neuron  (ION)  model  uses  two  separate  incoherent  optical  de¬ 
vice  responses  to  subtract  inhibitory  inputs  from  excitatory  inputs  for  general  neural  networks. 
The  operational  considerations  of  this  model,  based  on  a  Hughes  liquid  crystal  light  valve,  are 
discussed.  Experimental  demonstration  of  incoherent  subtraction  for  binary  and  analog  signal 
levels  are  presented. 


1  Introduction 

In  this  paper  we  demonstrate  the  feasibility  of  a  general  incoherent  optical  neuron  (ION)  model 
[1]  that  can  process  excitatory  and  inhibitory  signals  optically  without  electronic  subtraction. 
An  inner-product  type  optical  neuron  should  perform  a  nonlinear  operation  on  its  weighted  sum 
inputs  as  shown  in  Eq.  (1).  The  inputs  can  be  excitatory  (positive)  or  inhibitory  (negative). 

N  ' 

=  .  (i) 

j=' 

where  Vi  is  the  output  of  neuron  i  in  the  same  layer,  Vj  are  the  signal  inputs,  Wij  are  the 
synaptic  weights,  and  ip{')  is  the  output  nonlinear  function  of  the  neuron,  which  is  a  nondecreas¬ 
ing  function  and  has  a  finite  range. 

A  fully  coherent  system  can  subtract  signals  directly,  using  differences  in  phase  (or  path 
length)  of  the  optical  beams.  The  tradeoff  is  that  the  system  must  be  stable  within  significantly 
less  than  one  wavelength  (  ~  0.5  /jm  ).  In  addition,  the  phases  of  components  in  the  system 
must  be  accurately  controlled.  These  factors  lead  to  difficulty  in  implementing  arbitrary  neural 
networks  with  a  fully  coherent  system. 

An  incoherent  system  is  more  robust  in  terms  of  stability,  position  accuracy  requirements, 
and  noise  immunity.  Existing  techniques  such  as  input  bias  or  weight  bias  methods  suffer  from 
an  input  dependent  bias  or  a  threshold  that  must  vary  from  neuron  to  neuron  [2]. 

*  Proc.  IEEE  3rd  anna&l  Parallel  Processing  Symposium,  Fullerton,  CA.,  March  1989. 


Weight  bias: 


Vi  =  ^t{Wa  +  wh)Vj] 

i=  i 

=  ^WijVj  +  w,  £>,]  (2) 

3=1  3=1 

Input  bias: 

v  =  V'E^w  +  n)] 

3=1 

N  N 

=  (3) 

i=i  j=i 

In  order  to  eliminate  the  bias  at  each  pass  through  a  neuron,  the  threshold  of  each  neuron 
must  depend  on  its  inputs  in  Eq.  (2)  and  on  its  weights  in  Eq.  (3).  A  technique  described  in  [3] 
eliminates  most  of  these  drawbacks  in  the  special  case  of  fully  connected  networks. 

2  The  Incoherent  Optical  Neuron  Model 


Input 


Fig.  1  (a)  The  ION  structure,  (b)  Typical  characteristic  of  Hughes  liquid 

crystal  light  valve,  serving  as  both  I  and  N  elements.  Regions  A 
and  C  are  used  for  the  I  and  N  elements  respectively.  Region  B  is 
used  to  provide  bias  for  the  N  element. 

The  Incoherent  Optical  Neuron  (ION)  model  provides  for  incoherent  subtraction  in  optical 
neural  networks  without  the  above  limitations.  It  uses  separate  device  responses  for  inhibitory 
and  excitatory  inputs.  This  is  modeled  after  the  biological  neuron  which  processes  the  excitatory 
and  inhibitory  signals  by  different  mechanisms  (e.g.  chemical-selected  receptors  and  ion-selected 


gate  channels)  [5].  The  ION  comprises  two  elements:  an  inhibitory  (I)  element  and  a  nonlinear 
output  (N)  element  as  shown  in  Fig.  1(a).  The  inhibitory  element  provides  inversion  of  the  sum 
of  the  inhibitory  signals;  the  nonlinear  element  operates  on  the  sum  of  the  excitatory  signals, 
the  inhibitory  element  output,  and  sin  optical  bias  to  produce  the  output  of  the  neuron.  The 
inhibitory  element  is  linear;  the  nonlinear  threshold  of  the  neuron  is  provided  entirely  by  the 
nonlinear  output  element. 

Figure  2  shows  a  paradigm  for  an  optical  neural  network,  which  uses  incoherent  optical 
neurons  combined  with  optical  interconnections.  A  volume  hologram  can  be  used  to  emulate 
synaptic  weights  in  the  optical  neural  network.  The  interconnection  network  should  be  adaptive 
during  the  training  phase.  Psaltis  et.  aL  [6,  7]  have  discussed  several  learning  issues  in  photore- 
fractive  crystals.  As  shown  in  Fig.  2,  the  incoherent  optical  neurons  process  the  weighted  sums 
from  the  interconnection  network;  they  may  also  serve  as  input  transducers.  The  input  of  the 
neurons  are  also  fed  to  the  interconnection  network  to  form  correlation  with  the  output  of  the 
neurons  during  the  learning  phase.  Then  the  modified  interconnection  strength  is  stored  in  the 
interconnection  network. 
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Fig.  2  A  paradigm  for  an  optical  neural  network. 


3  Implementation  of  the  ION  model  by  LCLV 

A  Hughes  liquid  crystal  light  valve  (LCLV)  is  an  optical  light  modulator  which  can  implement 
approx.  10s  neurons  on  one  device.  A  1.6  x  103  gate  logic  array  based  on  an  LCLV  was 
demonstrated  by  J.  Wang  and  P.  Chavel  [10].  The  typical  response  time  of  the  LCLV  is  30  ms. 
Other  devices  currently  under  development,  such  as  ferroelectric  LCLVs,  have  response  times  on 
the  order  of  ns  [4].  Fig.  1(b)  shows  a  typical  characteristic  curve  of  a  LCLV,  which  can  be  used 
for  both  the  I  (region  A)  and  N  (region  C)  elements.  Region  B  is  required  to  provide  bias  for 
the  N  element.  In  the  remainder  of  this  section  we  will  discuss  how  an  LCLV  can  be  used  to 
implement  am  array  of  ION’s. 


I  Element  Response 
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Fig.  3  Characteristics  of  the  (a)  I  and  (b)  N  elements  in  the  test  circuit 
for  the  condition  V=5.0  volts,  f=1.5  Khz,  P=200  mw.  The  I  el¬ 
ement  is  fairly  linear  within  50%  of  its  op<4ation  range.  In  (b), 
the  self-feedback  of  the  N  element  is  necessary  to  satisfy  the  ION 
requirements  for  this  particular  device. 

Generally,  the  characteristic  of  the  LCLV  in  region  A  is  unfortunately  not  linear  (Fig.  3(a)). 
Let  the  mArimtim  output  of  the  light  valve  in  region  B  be  Ir  ,  the  residue  output  (Fig  1(b)). 
Then  the  ma-rimum  operation  range  of  the  inhibitory  (1)  element  is  between  0  and  [1].  In 
order  to  ensure  linear  subtraction,  we  need  to  further  limit  the  operation  range  of  the  I  element 
to  be  (0,  oo).  Thus  the  output  of  the  I  element  can  be  modeled  as 

j<2  = ».  -  '  .  (4) 

where  7,„a  €  (0,ao)  is  the  total  inhibitory  input.  For  the  biasing  conditions  of  our  LCLV, 
f=1.5Khz  and  V=5.01  volts,  if  we  limit  the  ma-rimnm  inhibitory  input  to  be  0.6<i2,  the  measured 
root  mean  square  deviation  from  linearity  is  16%.  Simulations  we  have  performed  indicate  that 
this  amount  of  nonlinearity  in  the  I  element  response  is  acceptable  [1].  To  match  the  response 
of  the  nonlinear  (N)  element,  the  output  of  the  I  element  is  attenuated  by  a  factor  7 m,  where 
m  =  (&x  —  a0)/ao  and  is  the  slope  of  the  I  element.  If  7  is  greater  than  1,  the  I  element  output 
is  attenuated,  otherwise  we  have  some  gain  for  the  I  element.  Let  the  corrected  output  of  the  I 
element  be  7in/i  =  7^ /7m,  then  the  output  of  the  N  element  is 

-  «)  *  +  7exc  +  7^-0)  (5) 

where  represents  the  total  input  to  the  N  element,  7exe  is  the  total  excitatory  input,  I^a, 
is  the  bias  term  for  the  N  element,  which  can  be  varied  to  change  the  threshold,  and  a  is  the 
offset  of  the  characteristic  curve  of  the  N  element.  */>(■)  denotes  the  nonlinear  output  function  of 
the  neuron.  The  operation  range  of  the  N  element  is  A  a  (Fig.  1(b))  which  should  be  equal  to 
the  output  variation  of  the  I  element,  00/7,  to  provide  linear  subtraction.  Since  Aa  depends  on 


the  bias  and  operating  parameters  of  the  light  valve,  we  can  adjust  7  to  fulfill  this  requirement. 
The  bias  of  the  N  element  is  set  to  be 


=  a  -  --  (6) 

7  m 

From  Fig  1(b),  the  bias  point,  Ibia*y  should  be  greater  than  a2  to  prevent  from  operating 
the  N  element  in  the  I  element  region.  Meanwhile,  the  separation  of  the  operation  points  of 
the  N  and  I  elements,  a  -  a2,  should  be  greater  than  ao/7;  since  the  output  of  the  I  element  is 
nonnegative,  i.e.  b\  /7m  -  ao/7  >  0,  we  can  instead  require 
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a  >  - +  a2.  (7) 

7m 

This  can  be  done  by  adjusting  the  bias  voltage  and  frequency  of  the  light  valve,  and  by 
selecting  the  proper  residue  output  level,  Ir.  We  rewrite  the  output  of  the  N  element,  via  Eqs. 
(4),  (5),  (6),  and  the  definitions  of  m  and  as 

=  *(/«-—).  *  (8) 

7 

If  the  N  element  inputs  are  attenuated  by  7,  then  we  get  perfect  subtraction.  Fig.  3(a)  and 
(b)  show  the  characteristic  curve  of  the  I  and  N  element  respectively.  Self-feedback  is  necessary 
for  the  LCLV  to  fulfill  the  constraints  of  the  N  element  response  for  the  ION  model. 


4  Experimental  Results 
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SF:  spatial  filters 

LI -6:  lens 
Ml -6:  mirrors 
PI -5:  polarizers 
BS1-12:  beam  splitters 
Detl-2:  power  meters 
CAM  1-2:  CCD  cameras 
MK1-4:  masks 
SI:  I  element  input 

SN:  N  element  input 


Fig.  4  Experimental  setup  of  the  test  ION  circuit. 


Figure  4  shows  the  experimental  setup  for  implementation  and  testing  of  an  array  of  ION’s. 
Three  input  beams  are  used  to  provide  N  element  bias,  I  element  inputs  and  N  element  inputs, 
which  are  controlled  by  polarizer  pairs  Pi,  P2  and  P3  respectively.  The  I  element  input  path,  SI- 
BS5-BS7-L2-BS8-LCLV,  is  imaging  with  a  magnification  factor  of  0.8.  The  same  magnification 
factor  is  applied  to  the  N  element  input  path,  SN-BS7-L2-BS8-LCLV.  Two  feedback  paths  are 
implemented,  one  for  I  to  N  connection,  which  is  BS9-BS10-L3-BS11-L5-BS12-BS8-LCLV.  The 
other  feedback  path  through  mirror  M5,  M6  is  for  the  N  to  N  self-feedback  connection. 

Figure  5(a)  and  (b)  shows  the  experimental  result  for  binary  subtraction.  For  the  binary 
case,  two  character  sets  are  chosen  for  the  N  (left  side)  and  I  (right  side)  inputs  (Fig.  5(a)).  All 
four  possible  cases  are  included  (corresponding  to  1  or  0  for  the  N  element  input,  and  1  or  0 
for  the  I  element  input).  A  bias  is  added  to  the  N  element  inputs.  Figure  5(b)  shows  I  element 
outputs  (right  side)  and  the  final  neuron  outputs  (left  side;  these  are  also  the  N  element  outputs). 
The  ideal  result  is  a  portion  of  the  character  “R”  (right  top)  and  the  full  character  “T”  (right 
bottom)  in  the  N  element  area,  and  agrees  with  Fig.  5(b);  these  regions  correspond  to  a  1  on 
the  excitatory  inputs  and  a  0  on  the  inhibitory  inputs. 
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Fig.  5  Results  of  ION  subtraction.  Binary  subtraction  with  (a)  N 
input  (left)  and  I  input  (right)  patterns,  and  (b)  the  sub¬ 
traction  result,  (c)  The  normalized  I  input  vs.  N  input  for 
a  constant  N  output,  showing  grey  level  response.  The  sub¬ 
traction  is  close  to  linear  if  we  only  use  50  %  of  the  total  I 
input. 

To  test  the  subtraction  linearity  for  the  grey  level  case,  we  set  the  I  input  to  its  minimum 
and  the  N  input  to  a  small  but  nonzero  value  and  measure  the  N  element  output.  For  every 
increment  in  the  I  input,  we  adjust  the  N  input  such  that  we  get  the  same  reading  for  the  N 
element  output.  Fig  5(c)  is  a  plot  of  the  normalized  linear  subtraction  of  the  N  inputs  from  I 
inputs.  Essentially,  it  is  equal  to  the  complementary  plot  of  the  I  element  response.  Here  we 
use  the  full  operation  range  of  the  I  element.  If  we  limit  the  operation  of  the  I  inputs  to  be 


50%  of  a2,  then  we  obtain  subtraction  that  is  very  close  to  linear.  In  our  setup,  the  range  of 
the  attenuated  I  output  is  around  0.2/it o/cm2.  Since  there  is  significant  loss  in  the  feedback  ( 
I  — »  N  )  path,  the  nonlinearity  of  the  N  element  needs  to  be  very  steep  (Le.  high  differential 
gain)  to  satisfy  the  ION  requirements;  thus  the  use  of  self-feedback  for  the  N  elements. 

5  Discussion  and  Conclusion 

An  incoherent  subtraction  method  for  neural  nets  was  implemented  using  a  Hughes  liquid  crystal 
light  valve.  Conceptually,  the  ION  model  is  a  general  purpose  model  which  can  implement  a 
variety  of  different  neuron  types.  A  conceptual  example  for  implementing  Hopfield  type  networks 
based  on  the  ION  model  was  shown  in  [1]. 

The  ION  model  can  also  implement  Grossberg’s  mass  action  neuron,  which  has  been  used 
in  a  variety  of  neural  networks  for  pattern  recognition  [8]  and  visual  perception  [9].  The  mass 
action  neuron  can  be  described  as 

# 

Xj  =  —  Axj  +  (5  —  X  j)Iexc  (9) 

/« C  =  £>(**)<*;  +  /; 

t=l 

"J 

f|'n/i  =  4*  J j 

i=l 

where  Xj  is  the  membrane  potential  of  neuron  j,  and  and  denote  the  total  excitatory 
and  inhibitory  inputs  to  neuron  j.  rf>(x)  and  <f>(x)  are  the  output  of  current  neuron  and  its 
inhibitory  intemeuron  respectively,  and  are  sigmoid  type  functions  in  most  cases.  I,  and  Jj 
are  the  excitatory  and  inhibitory  inputs  from  other  layers.  A  and  B  are  the  decay  constant 
and  mAYimnm  membrane  potential,  respectively.  Cii  and  Dii  are  the  interconnection  weights. 
The  above  equation  can  be  grouped  into  two  terms  to  be  implemented  by  I  and  N  elements 
respectively.  For  the  discrete  case,  we  can  rewrite  it  as 

Xj{k  +  1)  =  x,(Jfe)[l  ~(A  +  I„e  +  Iinh)\  +  BI„e  (10) 

To  implement  the  inhibitory  part,  we  need  the  I  element  with  adaptive  gain,  or  we  can 
modulate  the  I  element  read  beam  by  Xj  to  provide  the  multiplication.  The  second  term  is  the 
excitatory  component,  which  can  be  fed  to  the  N  element  directly.  For  the  steady-state  case,  the 
membrane  potential  is 

BI„e 

;  A  +  /cxe  +  link 

An  optical  implementation  of  pixel-by-pixel  division  was  shown  by  Efron  et.  al.  [11];  here  we 
can  use  the  output  of  the  N  element  as  the  read  beam  of  the  I  element  to  provide  the  required 
division. 
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Subtracting  incoherent  optical  neuron  model:  analysis, 
experiment,  and  applications 


Chein-Hsun  Wang  and  B.  Keith  Jenkins 


To  fully  use  the  advantages  of  optics  in  optical  neural  networks,  an  incoherent  optical  neuron  (ION)  model  is 
proposed.  The  main  purpose  of  this  model  is  to  provide  for  the  requisite  subtraction  of  signals  without  the 
phase  sensitivity  of  a  fully  coherent  system  and  without  the  cumbrance  of  photon-electron  conversion  and 
electronic  subtraction.  The  ION  model  can  subtract  inhibitory  from  excitatory  neuron  inputs  by  using  two 
device  responses.  Functionally  it  accommodates  positive  and  negative  weights,  excitatory  and  inhibitory 
inputs,  non-negative  neuron  outputs,  and  can  be  used  in  a  variety  of  neural  network  models.  This  technique 
can  implement  conventional  inner-product  neuron  units  and  Gross  berg’s  mass  action  law  neuron  units. 
Some  implementation  considerations,  such  as  the  effect  of  nonlinearities  on  device  response,  noise,  and  fan- 
in/fan-out  capability,  are  discussed  and  simulated  by  computer.  An  experimental  demonstration  of  optical 
excitation  and  inhibition  on  a  2-D  array  of  neuron  units  using  a  single  Hughes  liquid  crystal  light  valve  is  also 
reported. 


4 


I.  Introduction 

The  potential  advantages  of  using  optics  in  the  im¬ 
plementation  of  neural  networks  are  well  known  and 
stem  from  the  capability  of  optics  for  3-D,  high  density 
interconnections  and  analog  data  storage,  as  well  as 
rapid  multiplication  and  addition  of  analog  signals. 
As  an  example,  consider  a  neural  computation  per¬ 
formed  on  a  digital  electronic  machine.  Figure  1 
shows  a  sample  procedure  to  compute  the  weighted 
sum  of  the  membrane  potential  of  a  neuron  based  on  a 
single  digital  processor.  We  see  that  over  half  of  the 
time  (55%)  is  spent  on  moving  data  and  adjusting  the 
pointer.  Although  pipeline  and  multiprocessor  tech¬ 
niques  can  be  used  to  speed  up  the  computation,  bot¬ 
tlenecks  still  exist  in  moving  data  between  memories 
and  registers.  For  N  fully  connected  neurons,  0{N 2) 
multiplications  and  summations  are  required.  If  N 
processors  are  used  so  that  each  processor  corresponds 
to  one  neuron,  the  computation  time  is  O(N)  at  best. 
A  fully  parallel  analog  system  can  do  this  in  0(1)  time. 
In  the  electronic  case  the  partitioning  of  the  computa¬ 
tion  in  hardware  also  causes  limitations.  This  results 
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in  a  trade-off  that  depends  on  the  computation  over¬ 
head,  hardware  complexity,  power  dissipation,  and 
speedup.  Because  of  these  factors,  it  is  pertinent  to 
explore  the  use  of  analog  optics,  which  has  the  poten¬ 
tial  of  overcoming  these  scaleup  problems.  Analog 
optical  processing  accuracy  is  acceptable  in  many  neu¬ 
ral  network  applications.  In  addition,  for  pattern  rec¬ 
ognition  and  machine  vision  tasks,  the  inputs  are  light 
intensity.  Optical  neural  networks  can  potentially 
process  these  inputs  directly  without  any  serial  elec¬ 
tronic  conversion,  increasing  the  likelihood  of  a  fast, 
efficient  system. 

There  are  four  main  arithmetic  operations  in  a  con¬ 
ventional  neural  network:  multiplication,  addition, 
subtraction,  and  nonlinear  thresholding.  The  optics 
can  provide  analog  multiplication  and  addition  in  real 
time,  while  nonlinear  thresholding  can  be  performed 
by  an  optical  modulator.  Implementation  of  subtrac¬ 
tion  in  an  optical  neural  network  is  a  key  issue.  Coher¬ 
ent  and  incoherent  techniques  for  subtraction  differ 
markedly. 

A  fully  coherent  optical  system  can  subtract  signals 
directly,  using  differences  in  the  phase,  or  path  length, 
of  the  optical  beams.  An  example  of  such  a  system  is 
in  Ref.  1.  Subtraction  in  this  type  of  system  is  very 
efficient;  the  trade-off  is  that  the  system  must  keep 
relative  path  lengths  stable  to  within  much  less  than 
one  wavelength.  In  addition,  the  phases  of  compo¬ 
nents  in  the  system  must  be  accurately  controlled. 

An  incoherent  system  is  more  robust  in  terms  of 
stability,  position  accuracy  requirements,  and  noise 
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immunity.  Signals  are  typically  encoded  as  light  in¬ 
tensities;  this  provides  real,  non-negative  quantities 
which  must  at  some  point  be  subtracted.  A  variety  of 
techniques  for  linear  incoherent  optical  subtraction 
have  been  demonstrated  by  others.2  3  Many  of  these 
techniques  result  in  an  absolute  value  of  the  differ¬ 
ence  I Y  -  X\,  where  Y  and  X  are  the  image  operands. 
The  technique  described  in  Ref.  3  uses  a  liquid  crystal 
light  valve  (LCLV)  and  results  in  the  difference  image 
added  to  a  constant  bias  image,  i.e.,  Y—X  +  B,  where  B 
is  a  constant  bias.  Here  we  modify  and  extend  this 
concept  to  provide  directly  a  nonlinear  function  of  a 
linear  subtraction,  which  includes  a  threshold  below 
zero  and  above  some  user-defined  value  (per  neuron 
model).  In  this  case,  there  is  no  need  for  an  output 
bias,  and  in  our  model  there  is  none;  this  enables  direct 
cascadability  as  required  in  a  neural  net. 

Figure  2  shows  a  paradigm  for  an  optical  neural 
network,  which  uses  incoherent  optical  neurons  com¬ 
bined  with  optical  interconnections.  Here,  incoherent 
means  the  phase  of  the  input  light  to  the  neuron  is  not 
being  used  for  subtraction;  the  neuron  outputs  may 
still  be  coherent  light.  A  hologram  or  holograms  can 
be  used  to  emulate  synaptic  weights  in  the  optical 
neural  network.  The  interconnection  network  should 
be  adaptive  during  the  training  phase,  Psaltis  et  al .,4  5 
.for  example,  have  discussed  several  learning  and  re¬ 
calling  issues  in  photorefractive  crystals.  As  shown  in 
Fig.  2,  the  incoherent  optical  neurons  process  the 
weighted  sums  from  the  interconnection  network;  they 
may  also  serve  as  an  input  transducer.  The  inputs  of 
the  neurons  are  also  fed  to  the  interconnection  net¬ 
work  to  form  correlations  with  the  outputs  of  the  neu¬ 
rons  during  the  learning  phase.  Then  the  modified 
interconnection  strength  is  stored  in  the  interconnec¬ 
tion  network. 

References  6  and  7  give  examples  of  previous  inco¬ 
herent  optical  neural  networks.  Farhat  and  Psaltis  et 
a/.6  used  a  hybrid  electronic-optical  scheme  to  imple¬ 
ment  a  Hopfield  net.  Recently,  Shariv  and  Friesem7 
demonstrated  an  all-optical  neural  network  with  only 
inhibitory  neurons  for  the  case  of  a  Hopfield  type 
network. 

The  objective  of  this  paper  is  to  present  a  model  of 
an  incoherent  optical  neuron  (ION)  for  general  optical 
neural  networks  and  to  assess  its  practicality.  Its 
practicality  is  assessed  via  computer  simulations  of  the 
ION  in  a  neural  network  and  via  an  experimental  dem¬ 
onstration  of  an  array  of  IONs  using  a  liquid  crystal 
light  valve. 

Section  II  reviews  several  existing  techniques  for 
incoherent  subtraction  utilizing  an  input  and/or 
weight  bias,  in  some  cases  coupled  with  complemen¬ 
tary  weights  or  inputs.  These  techniques  suffer  from 
bias  buildup  and/or  thresholds  that  must  vary  from 
neuron  to  neuron.  A  technique  described  by  TeKolste 
and  Guest8  eliminates  these  problems  for  the  case  of 
fully  connected  networks.  Section  III  describes  the 
ION  model.  Its  uniform,  fixed  bias  decreases  the 
hardware  complexity  over  previous  techniques.  It  can 
be  used  to  implement  a  variety  of  neuron  models, 
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Fig.  1.  Sample  procedure  to  calculate  membrane  potential  based 
on  a  uniprocessor.  The  value  shown  in  parenthesis  is  the  number  of 
clock  cycles  for  an  Intel  80386  processor.  Clock  period  is  50  ns. 
Only  45%  of  the  time  is  used  in  actual  computation. 
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Fig.  2.  Paradigm  for  an  optical  neural  network. 


including  the  binary  neuron  (e.g.,  McCulloch-Pitts, 
19439),  and  analog  neuron  (e.g.,  Grossberg,  1973, 10  Fu- 
kushima,  1975u). 

Section  IV  discusses  device  and  system  imperfec¬ 
tions  that  might  affect  the  operation  of  the  ION  model, 
including  undesired  nonlinearities  in  the  device  re¬ 
sponse,  as  well  as  device  and  system  noise.  The  immu¬ 
nity  of  the  ION  model  to  these  imperfections  is  ana¬ 
lyzed  via  computer  simulations  of  it  in  a  competitive 
neural  network  in  Sec.  V.  Section  VI  reports  on  an 
experimental  demonstration  of  incoherent  subtraction 
using  a  Hughes  liquid  crystal  light  valve  to  implement 
a  2-D  array  of  IONs.  Section  VII  discusses  a  variant  of 
the  ION  model,  the  linear  subtraction  ION.  A  special 
case  of  this  model  is  used  in  the  Amari12/Hopfield13 
net.  Applications  of  the  ION  model  in  two  kinds  of 
neural  network  are  discussed  in  Sec.  VIII,  i.e.,  Fuku- 
shima’s  multilayered  networks1114-18  and  a  version  of 
Grossberg’s  shunting  networks.19-23  A  device  require¬ 
ment  analysis  for  the  ION  is  detailed  in  the  Appendix. 
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.  II.  Methods  for  Incoherent  Subtraction 

We  model  the  recall  or  computation  process  of  a 
single  layer  of  a  neural  network  as 


where  is  the  output  of  neuron  i,  V)  are  the  signal 
inputs,  Wij  are  the  synaptic  weights,  and  ^(-)  is  the 
output  nonlinear  function  of  the  neuron,  which  is  a 
□ondecreasing  function  of  the  neuron  inputs  and  has  a 
finite  range.  Generally,  WtJ  can  be  positive,  zero,  or 
negative;  P,  and  V)  are  non-negative  in  many  neuron 
models  but  can  take  on  negative  values  in  other  mod¬ 
els. 

In  this  section  we  review  several  methods  of  subtrac¬ 
tion  in  incoherent  optical  neurons.  Conceptually,  the 
interpretation  of  an  inhibitory  signal  can  be  either 
positive  weight/negative  signal  or  negative  weight/ 
positive  signal.  Of  course,  a  bias  can  be  added  to 
either  the  input  signals  or  the  weights,  yielding  sub¬ 
traction  in  a  straightforward  manner.  First,  the 
weight-bias  method  uses  negative  weights  and  positive 
outputs  to  code  the  inhibitory  signals.  The  output  of 
the  neuron  i  is 


where  W ;7  represents  the  connection  strength  from  the 
jth  neuron  to  the  ith  neuron,  which  can  be  either 
positive  or  negative  and  is  normalized  to  be  between 
-0.5  and  0.5.  When  Wjy  is  positive,  it  represents  an 
excitatory  connection;  when  it  is  negative,  it  is  inhibi¬ 
tory  and  is  subtracted  from  the  excitatory  inputs.  The 
term  W/,  is  the  bias  in  weight,  usually  0.5,  and  V/  is  the 
output  of  the  /th  neuron,  which  is  positive  and  is  be¬ 
tween  0  and  1.  The  non-negative  values  ( Wy  +  Wt,) 
and  V j  can  then  be  implemented  using  incoherent 
optics.  The  second  term  WbX^m  t  V)  is  an  input  depen¬ 
dent  bias,  which  is  impractical  for  realization  due  to 
the  required  dynamic  threshold  of  the  neurons. 

The  second  technique  biases  the  input  signals  and 
uses  positive  weights  and  negative  inputs  for  inhibi¬ 
tory  signals  and  can  be  written  as 


Y  W„{V,  +  vb) 


X  w„vl+vl 


y-i 


N 

y  ■ 

i-i 


(3) 


In  this  case  the  non-negative  quantities  W,7  and  ( V;  + 
Vi,)  are  represented  physically  using  incoherent  optics. 
The  second  term  Vb^j.\  Wtj  is  a  weight  dependent  bias 
term.  Since  the  weights  of  the  neural  network  are 
changed  from  time  to  time  to  adapt  to  their  environ¬ 
ment  by  learning,  this  term  is  difficult  to  implement. 
We  need  to  calculate  the  sum  of  weights  into  each 


neuron,  which  increases  the  implementation  complex¬ 
ity,  particularly  if  volume  holograms  are  used.  Never¬ 
theless,  this  intensity  bias  method  can  potentially  be 
used  in  the  special  case  of  neural  networks  that  assume 
conservation  of  the  sum  of  weights  into  a  neuron. 
Such  networks  have  been  described,  for  example,  by 
von  der  Malsburg.24  In  this  case  the  second  term 
VbZjL  [  W^  is  a  constant  and  can  be  treated  as  a  fixed 
threshold  of  the  optical  neuron. 

A  similar  technique  called  bias  subtraction  has  been 
discussed  by  Gmitro  and  Gindi.25  In  their  approach, 
the  output  of  the  neuron  is  positive  and  the  weight  can 
be  either  positive  or  negative  to  represent  an  excitatory 
or  inhibitory  connection  efficiency.  During  imple¬ 
mentation,  two  channels  are  used  to  process  positive 
and  negative  weights  separately;  the  negative  channel 
is  implemented  by  positive  weights  and  complemen¬ 
tary  input  signals,  which  are  also  positive: 


9, 


=  vM 


x  w*v* + y  w‘M  -  v 


=  wlh v\~X  WijVj  +  X  Waj  .  (4) 

where  W,-*  and  WtJ  are  the  weights  to  the  positive 
(excitatory)  and  negative  (inhibitory)  inputs  of  the  ith 
neuron.  They  are  positive  quantities  during  optical 
implementation  and  are  represented  by  the  transmit¬ 
tance  of  a  mask  or  diffraction  efficiency  of  a  hologram. 
The  bias  term  2 ;  W,j  must  be  canceled  out  by  adjusting 
the  threshold  of  each  neuron.  This  increases  imple¬ 
mentation  complexity. 

Another  technique,  proposed  by  TeKolste  and 
Guest,®  is  a  hybrid  of  the  previous  two: 


The  bias  term  N/2  is  independent  of  input  as  well  as 
weight,  which  simplifies  the  requisite  hardware  con¬ 
siderably.  This  approach  applies  to  unipolar  neurons 
and  fully  connected  networks  only. 


III.  Incoherent  Optical  Neuron  (ION)  Model 

From  a  biological  point  of  view,  the  inhibitory  site 
can  be  treated  as  a  distinct  mechanism  from  the  excit¬ 
atory  site,  e.g.,  neurotransmitters  vs  chemical-selected 
receptors.27'29  The  inhibitory  site  then  accepts  a  posi¬ 
tive  input  to  produce  a  negative  effect  on  the  mem¬ 
brane  of  the  neuron.  The  ION  model  follows  this 
approach;  it  uses  spatially  and  physically  distinct  con¬ 
trol  mechanisms  to  emulate  the  excitatory  and  inhibi¬ 
tory  signal  processing  in  a  biological  neuron. 

The  ION  comprises  two  elements:  an  inhibitory  (!) 
element  and  a  nonlinear  output  (N)  element.  The 
inhibitory  element  provides  an  inversion  of  the  sum  of 
the  inhibitory  signals;  the  nonlinear  element  operates 
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*inh 


attenuator 


Fig.  3.  The  ION  model:  (a)  normalized  inhibitory  (/)  element;  (b) 
unnormalized  /  element;  (c)  nonlinear  (IV)  element;  and  (d)  the  ION 
structure. 


on  the  excitatory  signals,  the  inhibitory  element  out¬ 
put,  and  an  optical  bias  to  produce  the  output.  The 
inhibitory  element  is  linear;  the  nonlinear  threshold  of 
the  neuron  is  provided  entirely  by  the  nonlinear  out¬ 
put  device.  Figures  3(a)— (c)  show  the  characteristic 
curves  of  the  /  and  N  elements,  respectively.  The 
structure  of  the  ION  model  is  illustrated  in  Fig.  3(d). 
The  output  of  the  normalized  /  element  is  given  by 

C-Winh  (6) 

and  of  the  N  element  is  given  by 

C  “^lC  +  /„c  +  4,u-o|.  (7) 

where  /;nh  and  /eic  represent  the  total  (weighted)  in¬ 
hibitory  and  excitatory  neuron  inputs,  respectively. 
These  include  any  lateral  feedback  signals  as  well  as 
inputs  from  other  layers.  I\,in  is  the  bias  term  for  the 
N  element,  which  can  be  varied  to  change  the  thresh¬ 
old,  and  a  is  the  offset  of  the  characteristic  curve  of  the 
N  element  (refer  to  the  Appendix  for  a  more  detailed 
discussion  of  a).  The  term  i^(-)  denotes  the  nonlinear 
output  function  of  the  neuron.  If  we  choose  /bias  to  be 
a  —  1,  the  output  of  the  N  element  is 

C  -  W,,'  -  /mh).  (8) 

which  i3  the  desired  subtraction. 

In  general,  the  /  element  will  not  be  normalized  [Fig. 
3(b)].  In  this  case  the  offset  and  slope  of  its  response 
can  be  adjusted  using  /bias  and  an  attenuating  element 
(ND  filter),  respectively,  again  enabling  proper  sub¬ 
traction.  (The  unnormalized  /  element  must  have 
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Fig.  4.  Homogeneous  case  example:  typical  characteristic  of  a 
Hughes  liquid  crystal  tight  valve  serving  as  both  I  and  N  elements. 
Regions  A  and  C  are  used  for  the  /  and  N  elements,  respectively. 
Region  B  serves  to  separate  the  /  element  from  the  N  element. 


gain  >1.)  More  quantitatively,  the  characteristic 
curve  of  the  unnormalized  /  and  N  elements  [Figs.  3(b) 
and  (c)]  can  be  modeled  as 
b. 

C  =  --Anh  +  8,  forO  </,„„£  a  „ 

at 

4  (9) 

for  £></,„, 

where  l\ ™  denotes  the  sum  of  all  inputs  to  the  N 
element.  The  output  of  the  /  element  is  attenuated  by 
an  ND  filter  [Fig.  3(d)];  the  intensity  transmittance  of 
the  attenuator  is  equal  to  the  magnitude  of  the  slope  of 
the  characteristic  curve  of  the  /  element  In  this  case, 
the  bias  for  the  N element  is  changed  to  a  -  ax. 

The  ION  model  can  be  implemented  using  separate 
devices  for  the  /  and  N  elements  (heterogeneous  case) 
or  by  using  a  single  device  with  a  nonmonotonic  re¬ 
sponse  to  implement  both  elements  (homogeneous 
case).  Possible  devices  include  bistable  optical  ar¬ 
rays  30-32  and  spatial  light  modulators  (SLMs)  such  as 
liquid  crystal  light  valves  (LCLV).33-34  A  single 
Hughes  liquid  crystal  light  valve  can  be  used  to  imple¬ 
ment  both  elements  (Fig.  4). 

A  positive  neuron  threshold  (d)  can  be  implemented 
in  the  ION  by  decreasing  the  bias  by  the  same  amount  6 
to  /j,^  (Fig.  4).  Similarly,  a  negative  threshold  is 
realized  by  increasing  the  bias  by  8  to  /[jj^.  Distinct 
from  the  Gmitro  and  Gindi  bias  subtraction  technique, 
this  model  inverts  only  the  sum  of  inhibitory  inputs  to 
a  neuron. 

A  detailed  analysis  for  implementing  the  ION  model 
is  given  in  the  Appendix,  which  describes  the  device 
requirements  and  threshold  implementation  as  well  as 
constraints  on  fan-in,  fan-out,  and  gain.  The  maxi¬ 
mum  fan-in  for  the  ION  model  is  determined  by  the 
extinction  ratio  of  the  device,  while  the  maximum  fan¬ 
out  is  bounded  by  the  device  differential  gain  and 
system  loss. 

The  features  of  the  ION  model  include  a  bias  that  is 
essentially  independent  of  input  weight  and  signals,  a 
dynamically  and  globally  variable  threshold,  a  capabil- 


j 


•  ity  of  implementing  a  sigmoid  or  binary  threshold 
function  for  different  neuron  models,  cascadability 
and  ease  of  implementation.  Due  to  the  separation  of 
the  control  mechanisms  in  the  ION  model,  it  can  im¬ 
plement  more  complex  neuronal  functions;  for  exam¬ 
ple,  global  inhibition  by  using  the  output  of  one  inhibi¬ 
tory  element  to  control  the  reading  beam  intensity  of 
many  excitatory  elements. 

IV.  Effect  of  Device  and  System  Imperfections 

A.  Nonlinearity  in  the  /  Element 

Generally,  the  physical  input/output  characteristic 
of  an  /  element  will  not  be  linear  (see,  e.g.,  Fig.  4). 
Subtraction  can  be  obtained  by  further  limiting  the 
operation  range  of  the  I  element  to  a  sufficiently  linear 
region.  Let  this  region  be  [ am,au\ .  In  this  case,  the  I 
element  needs  a  bias  an  to  operate  in  the  linear  region. 
Thus,  the  output  of  the  I  element  can  be  modeled  as 

C  =  (10) 

where  A  a/  =  a«  —  am  is  the  input  range  of  the  I  element, 
/inh  is  the  inhibitory  signal  input,  and  /jnh  +  am  is  the 
total  intensity  input  to  the  I  element.  We  define  an 
effective  slope  of  this  pseudo  linear  region  as  m*  =  (bi 
—  b0)/Aai,  and  then  attenuate  the  output  of  the  / 
element  by  m*.  The  N element  bias  is  set  to  be  a  -  bj 
m*,  and  properly  scaled  subtraction  is  again  obtained. 

For  example,  for  the  biasing  conditions  of  our  LCLV 
(given  in  Sec.  VI),  if  the  inhibitory  input  is  limited  to 
the  region  [0,0.602],  the  normalized  root  mean  square 
deviation  from  linearity  (nmse)  is  reduced  to  16%  from 
approximately  36%.  The  quantity  a2  is  defined  as  the 
input  that  is  just  sufficient  to  drive  the  output  of  the 
device  to  a  functional  0  (see  the  Appendix).  Here  the 
normalized  root  mean  square  error  is  defined  as 


where  y(x)  andy(x)  are  the  ideal  and  actual  outputs  of 
the  device  with  the  input  value  of  x,  respectively. 
Simulations  we  have  performed  indicate  that  this 
amount  of  nonlinearity  (16%)  in  the  I  element  response 
is  acceptable. 

B.  Noise  Model  for  the  ION 

Here  we  use  noise  to  mean  any  undesired  signals, 
including  perturbation  of  the  operating  point  of  the 
device,  nonuniformity  of  the  device,  variation  in  oper¬ 
ating  characteristics  from  device  to  device  due  to  pro¬ 
duction  variation,  internal  noise  inherent  to  the  de¬ 
vice,  and  environmental  effects.  Some  of  these  effects 
are  global  (they  affect  all  neuron  units  on  a  device 
identically),  others  are  localized  (each  neuron  unit  be¬ 
haves  differently;  the  noise  on  neighboring  neuron 
units  on  a  device  may  be  independent  or  correlated, 
depending  on  the  source  of  the  noise).  Both  temporal 
and  spatial  characteristics  of  the  noise  need  to  be  in¬ 


cluded.  The  effect  of  noise  on  an  additive  lateral 
inhibitory  network  was  discussed  by  Stirk  et  al?b 
Here,  we  construct  a  noise  model  for  the  ION  by  con¬ 
sidering  the  origin  and  impact  of  the  noise  sources. 

The  possible  noise  sources  in  the  ION  model  can  be 
classified  into  four  categories:  input  noise,  device 
noise,  system  noise,  and  coupling  noise.  The  input 
noise  can  be  modeled  at  the  device  input  and  includes 
environmental  background  noise  and  residual  output 
of  the  optical  devices.  Essentially,  it  has  nonzero 
mean  and  varies  slowly  with  time.  The  device  noise  is 
mainly  caused  by  uncertainty  in  the  device’s  charac¬ 
teristics,  for  example,  drift  of  the  operating  point  and 
variation  of  gain  due  to  temperature  or  other  effects. 
The  system  noise  has  a  global  effect  on  all  neuron  units 
on  an  optical  device  and  includes  fluctuations  in  the 
optical  source.  Finally,  the  coupling  noise  (crosstalk) 
is  due  to  poor  isolation  between  the  optical  neuron 
units,  crosstalk  from  the  interconnection  network,  and 
imperfect  learning.  As  noted  in  Ref.  35,  alignment 
inaccuracies  and  imperfect  focusing  and  collimating 
optics  also  cause  localized  crosstalk.  Coupling  noise  is 
signal  dependent. 

1.  Input  Noise 

Let  the  environmental  background  noise  for  the  I 
and  N  elements  be  denoted  by  and  Nj,M,  respec¬ 
tively.  The  total  residual  output  noise  Nr  is  caused  by 
residual  output  of  the  optical  device.  At  the  input  of 
an  incoherent  optical  neuron  it  is  Nr  =  2^1  W,;/r/iVout, 
which  is  weight  dependent  and  varies  slowly  with  time 
due  to  learning.  Term  Wi;  is  the  interconnection 
strength  from  neuron  j  to  neuron  i,  Ir  is  the  residual 
output  of  the  optical  device  [Fig.  3(c)],  and  NMl  de¬ 
notes  the  fan-out  of  the  optical  neuron  unit  Per¬ 
turbation  of  the  weights  can  be  treated  as  an  input 
dependent  noise  source  as  Nw  =  Z^AW./x,-,  where 
each  A  is  assumed  independent.  For  the  intercon¬ 
nection  network,  imperfect  learning  of  the  weights, 
nonuniformity  of  the  weights,  residual  weights  after 
reprogramming,  and  perturbation  of  the  reference 
beam  intensity  will  cause  weight  noise.  Since  each  of 
these  noise  sources  can  occur  at  the  input  of  both  I  and 
N  elements,  the  output  of  the  ION  for  the  case  of 
normalized  characteristics  is 

-  *<|1  -  [/inh  +  K  +  W  +  JV^H 

+  IL.C  +  +  [a  -  1]  -  a).  (12) 

If  the  background  noise  is  space  invariant  and  the  / 
and  N  elements  have  the  same  device  area,  the  terms 
NjP  and  will  cancel  out.  The  residual  noise  terms 
Nrn  and  N™  and  weight  noise  terms  N]P  and 
generally  do  not  cancel. 

2.  Device  Noise 

We  model  two  noise  effects  in  the  /  element,  as 
illustrated  in  Figs.  5(a)  and  (b):  shift  (drift)  and  gain 
variation  in  the  device  characteristics,  which  are  de¬ 
noted  as  and  Nf1,  respectively.  For  the  output  N 
element,  the  gain  variation  [Fig.  5(e)]  only  modifies  the 
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Fig.  5.  Modeling  the  device 
noise  of  an  incoherent  optical  neu¬ 
ron:  I  element  with  (a)  drift  ( ver- 
ticai/horizontal),  (b)  gain  varia¬ 
tion;  N  element  with  (c) 
horizontal  drift  or  variation  of  the 
bias;  (d)  vertical  drift,  and  (e)  gain 
variation. 


nonlinearity  of  the  element  N.  If  this  gain  variation  is 
a  slowly  varying  effect,  it  will  have  little  effect  on  the 
dynamic  behavior  of  the  network;  so  for  the  N  element 
we  consider  only  drift  which  is  denoted  by  N^.  The 
N  element  drift  can  be  horizontal  [N^]  [Fig.  5(c)]  or 
vertical  [Fig.  5(d)].  The  vertical  drift  of  one  neuron 
unit  becomes  an  additive  noise  at  the  input  of  the  next 
neuron  unit,  and  so  will  be  approximated  by  including 
it  in  the  residual  noise  term  above.  The  horizontal 
drift  has  the  same  effect  as  a  perturbation  in  the  bias 
term,  denoted  by  N\,p. 

If  the  gain  variation  is  small,  it  can  be  included  in  the 
/  element  response  by  expressing  its  output  as  (1  + 
Ng){  1  -  /inh),  where  Ng  denotes  the  gain  noise. 

3.  System  Noise 

System  noise  has  a  global  effect  on  the  ION.  If  it  is 
caused  by  an  uncertainty  in  the  optical  source,  it 
causes  a  variation  in  the  characteristic  curve  of  the 
device  and  a  perturbation  in  the  bias  term.  In  the  case 
of  an  LCLV,  a  perturbation  in  the  reading  beam  inten¬ 
sity  produces  a  gain  variation  in  the  I  element  and  a 
combination  of  gain  variation  and  horizontal  drift  in 
the  N  element  (or  equivalently,  essentially  a  rescaling 
of  its  output  axis).  Device  gain  variation  of  the  I 
element  was  discussed  above  and  is  local,  i.e.,  it  varies 
from  one  neuron  unit  to  another  on  a  given  device. 
Variation  in  gain  due  to  system  noise  is  global.  The 
normalized  device  noise  and  system  noise  can  be  mod¬ 
eled  as 

/«,,  -  *1(1  +  *<;>][  t  +  K  ~  /,  J 

+  +  JVjjP  +  (a  -  1  +  /V  -  or|.  (13) 

4.  Crosstalk 

Crosstalk  can  be  caused  by  the  physical  construction 
of  the  interconnection  network  [e.g.,  coupling  between 


different  holograms,  diffraction  in  the  detection  (neu¬ 
ron  unit  inputs)  plane,  inaccurate  alignment  and  fo¬ 
cusing]  .  It  £ah  also  be  caused  by  imperfect  learning  or 
reprogramming  of  the  synaptic  weights,  where  the  per¬ 
turbation  of  different  weights  is  correlated.  In  gener¬ 
al,  crosstalk  is  signal  dependent  and  varies  from  one 
neuron  unit  to  another  on  a  given  device.  It  can  be 
modeled  as  an  input  noise  to  the  /  and  N  elements.  It 
is  excluded  from  our  current  simulations  because  it  is 
signal  dependent. 

Based  on  the  above  discussion,  these  noise  terms  can 
be  grouped  into  additive  (IV/)  and  multiplicative  (N}) 
noise  of  the  I  element  and  additive  noise  {N%)  of  the 
output  element  N.  The  general  noise  model  of  the 
ION  can  then  be  written"  as 

-  *1(1  +  -  U  +  Nt\  +  Im  +  JVJ  -  II,  (14) 

where  is  the  sum  of  the  drift  noise  [N^f1],  back¬ 
ground  noise  residual  noise  [iV/'j,  weight  noise 

[Nln],  and  crosstalk  noise  [A/*7*];  A/^  is  the  gain  noise  of 
the  /  element;  and  N ^  is  the  sum  of  the  background 
[Mw]  and  residual  input  noise,  horizontal  shift  noise 
[^VdhJ>  weight  noise  and  crosstalk  noise  of  the  N  ele¬ 
ment,  and  bias  noise  [A/bP]. 

V.  Computer  Simulation 

A.  I  Element  Compensation 

To  assess  the  effect  of  imperfect  device  responses  for 
the  I  element,  we  have  performed  simulations  on  a 
variant  of  Gross  berg’s  on-center  off-surround  compet¬ 
itive  network10  26  for  edge  detection  (Fig.  6).  The  net¬ 
work  contains  thirty  inner-product  type  neurons  con¬ 
nected  in  a  ring  structure  with  input  and  lateral  on- 
center  off-surround  connections.  To  optimize  the  use 
of  the  (nonlinear)  /  element,  an  attenuator  (neutral 
density  filter)  can  be  placed  in  front  of  the  I  element  to 
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- —  excitatory  Input 

- O  Inhibitory  input 


(b) 

Fig.  6.  On-center  off-surround  competitive  neural  network:  (a) 
network  and  (b)  interconnection  weight  strengths  as  a  function  of 
distance  from  a  neuron.  Interconnections  in  this  network  are  space 
invariant. 

reduce  the  overall  gain  to  bring  it  closer  to  the  ideal 
response.  Figure  7  shows  computer  simulations  of  the 
network  responses  based  on  a  nonlinear  curve36  that  is 
a  close  approximation  to  our  measured  LCLV  charac¬ 
teristic  (Sec.  VI)  for  different  input  attenuations. 
Each  resolvable  row  of  Fig.  7  represents  a  1-D  simula¬ 
tion  on  a  distinct  1-D  input.  Thirty  different  binary 
inputs  were  each  simulated  at  four  different  input 
signal  levels.  Apparently  the  attenuation  can  have  a 
tolerance  of  approximately  ±20%.  In  our  simulations 
we  used  an  attenuator  but  no  input  bias  to  the  I  ele¬ 
ment;  the  region  of  operation  for  these  curves  extended 
over  most  of  the  input  range  of  the  device  ([0,0.7a2]). 

B.  Noise  Effect 

We  use  the  same  network  to  test  the  effect  of  noise 
(of  course  the  results  are  actually  network  dependent) 
to  get  an  idea  of  the  noise  immunity  and  robustness  to 


physical  imperfections  of  the  ION  model.  In  the  com¬ 
puter  simulation,  each  of  the  three  noise  sources  in  Eq. 
(14)  are  assumed  independently  distributed  Gaussian 
with  zero  mean.  We  define  the  maximum  perturba¬ 
tion  p  of  the  noise  source  as  twice  the  standard  devi¬ 
ation,  expressed  as  a  percentage  of  the  input  signal 
level.  A  normalized  mean  square  error  (nmse)  [see  Eq. 
(11)]  is  used  to  measure  the  acceptability  of  the  result. 
Although  it  is  not  a  perfect  measure,  a  nmse  <  0.1-0.15 
generally  looks  acceptable  for  the  network  response 
due  to  our  input  test  pattern. 

The  quality  of  the  output  is  a  function  of  the  tempo¬ 
ral  and  spatial  correlation  of  the  noise.  Figure  8(a) 
shows  the  nmse  vs  percentage  of  maximum  noise  per¬ 
turbation  for  the  input  level  of  0.7  and  for  noise  that  is 
correlated  over  different  time  periods  T.  The  noise 
sources  for  each  neuron  unit  are  assumed  independent 
and  identically  distributed.  The  temporal  correlation 
of  each  noise  source  with  its  previous  values  is  given  by 
N(t  +  1)  =  h,N(t  +  1  -  i),  and  the  correlation 
coefficients  h,  decrease  linearly  with  t  (to  hr  =  0).  In 
Fig.  8(a),  all  three  noise  sources  in  our  model  are 
present  and  have  the  same  variance.  If  the  acceptance 
nmse  criterion  is  0.15,  a  perturbation  of  ±10%  on  each 
noise  source  yields  an  acceptable  result  in  all  cases. 
The  nmse  increases  as  the  input  level  and  noise  vari¬ 
ance  increase  [shown  in  Fig.  8(b)  for  T  =  50]. 

In  some  cases,  the  noise  is  spatially  correlated.  We 
simulated  the  network  with  spatially  correlated  noise. 
The  spatial  correlation  is  assumed  to  have  a  Gaussian 
profile.  Figures  9(a)  and  (b)  are  the  responses  for  a 
spatial  correlation  range  of  3  and  13,  respectively, 
while  Figs.  9(c)  and  (d)  show  the  responses  for  spatially 
and  temporally  correlated  noise. 

Drift  of  the  device  characteristic  is  a  global  effect. 
Figure  10  simulates  slowly  varying  and  quickly  varying 
i  and  N  element  drifts  on  this  network;  a  ±10-15% 
perturbation  in  drift  is  apparently  acceptable.  Figure 
11  shows  the  effect  of  local  I  element  gain  variation 
that  is  spatially  correlated;  a  ±  10-15%  perturbation  in 
gain  is  apparently  acceptable. 


Fig.  7.  Network  responses  for  different  attenua¬ 
tion  factors,  *i.  at  the  input  to  the  nonlinear  inhibi¬ 
tory  element.  Each  horizontal  scan  line  of  the 
figure  represents  a  separate  simulation  on  a  1-D 
input:  lal  input  pattern;  (b)  i|  *  1.0.  no  attenua¬ 
tion;  (c)  s,  «  1.5:  ( d )  a i  *  2.5;  (e)  J|  “  3.U;  and  (f)  j|  = 
3.5.  The  ideal  output  is  essentially  identical  to  Id) 
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Fig.  8.  Normalized  mean  square  error  (nmse)  measure  of  the  net¬ 
work  response  for  temporally  correlated  noise.  Three  noise  sources, 
N *,  N",,  and  N^aie  simulated,  (a)  Normalized  mean  square  error  of 
the  net  output  vs  maximum  noise  perturbation  p  for  correlation 
periods  (T)  ranging  from  1  to  50.  The  input  level  is  0.7.  (b)  Output 
nmse  plot  for  different  noise  perturbations  and  input  levels  (T  “50). 


VI.  Experiment 

Figure  12  shows  the  experimental  setup  for  imple¬ 
menting  and  testing  an  array  of  IONs.  Three  input 
beams  are  used  to  provide  N  element  bias,  /  element 
inputs,  and  N  element  inputs,  which  are  controlled  by 
polarizer  pairs  PI,  P2,  and  P3,  respectively.  The  / 
element  input  path,  SI-BS5-BS1-L2-BS8-LCLV,  is 
imaged  with  a  magnification  factor  of  0.8.  The  same 
magnification  factor  is  applied  to  the  N  element  input 
path,  SN-BS1-L2-BS8-LCLV.  Two  feedback  paths 
are  implemented,  one  for  the  I  to  N  connection,  which 
is  BS9-BS10-L3-BS11-Z.5-.BS12-BS8-LCLV.  The 


0.3 


0.5 
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Fig.  9.  Simulation  results  for  spatially  and  temporally  correlated 
noise;  sc  is  the  spatial  correlation  range  and  T  is  the  temporal 
correlation  period.  Three  noise  sources  are  simulated  simulta¬ 
neously,  each  with  perturbation  p  “  ±10%.  The  nmse  is  the  normal¬ 
ized  mean  square  error  of  the  network  output.  Only  spatially  corre¬ 
lated  noise  (T  “  1)  with  (a)  sc  ”  3,  nmse  “  0.08,  (b)  sc  “  13,  nmse  = 
C.13,  spatially  and  temporally  correlated  noise  (T  “  25)  with  (c)  sc  = 
3,  nmse  “  0.14,  (d)  sc  “  13,  nmse  “  0.18. 


Fig.  10.  Effect  of  device  drift  in  the  ION.  The  drift  is  uniform  over 
all  neuron  units.  High  frequency  drift  (T  “  1)  with  (a)  p  -  ±10%, 
nmse  -  0.02,  (b)  p  »  ±25%,  nmse  =  0.09;  low  frequency  drift  (T  =  50) 
with  (c)  p  “  ±10%,  nmse  -  0.07,  (d)  p  “  ±25%,  nmse  =*  0.11. 


other  feedback  path  is  through  mirrors  A/5  and  A/6, 
and  is  for  the  N  to  N  self-feedback  connection.  Each 
feedback  path  images  from  the  LCLV  output  plane  to 
the  LCLV  input  plane;  the  l  to  N  feedback  path  also 
shifts  the  image  to  the  N  element  input.  Mask  MK2  is 
used  to  block  the  bias  beam  to  the  /  elements.  Masks 
MK3  and  MK4  block  the  N  and  I  element  outputs  in 
the  I  to  N  and  N  to  N  feedback  paths,  respectively. 

Figures  13(a)  and  (b)  give  the  input/output  charac¬ 
teristics  of  the  /  and  N  elements  for  an  applied  voltage 
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Fig.  11.  Effect  of  gain  variation  of  the  /  element  in  the  ION.  This 
effect  is  nonuniform  with  some  spatial  correlation.  High  frequency 
gain  variation  with  (a)  T  -  1,  sc  -  3.  p  -  ±10%,  nmse  -  0.04,  (b)  T  - 
1,  sc  «  3,  p  *  ±25%,  nmse  «  0.11;  low  frequency  variation  with  (c)  T 
-  25,  sc  -  9,  p  »  ±10%,  nmse  -  0.09,  (d)  T  -  25.  sc  -  9,  p  -  ±25%, 
nmse  =>  0.18. 

of  5.0  V  rms  at  a  frequency  of  1.5  kHz.  In  our  case,  the 
Hughes  LCLV  used  has  a  twisted  nematic  liquid  crys¬ 
tal  and  a  CdS  photoconductor.  Self-feedback  for  the 
N  element  is  necessary  to  fulfill  the  constraints  of  the 
ION  model.  Figure  14  shows  experimental  results  of  a 
single  neuron.  Figures  14(a)— (d)  show  binary  subtrac¬ 
tion,  while  Figs.  14(e)  and  (f)  show  gray  level  subtrac¬ 
tion.  The  neuron  size  is  ~2-mm  diameter,  as  was  the 
pixel  size  for  the  measurements  in  Fig.  13. 

To  demonstrate  a  2-D  array  of  neuron  units,  the 
continuous  case  was  implemented,  i.e.,  no  isolation 
between  neuron  units.  Figures  15(a)  and  (b)  show  the 
experimental  result  for  binary  subtraction.  Two  char¬ 
acter  sets,  each  6-mm  square,  are  chosen  for  the  N  (left 
side)  and  I  (right  side)  inputs  [Fig.  15(a)],  All  four 


possible  cases  are  included  (corresponding  to  1  or  0  for 
the  N  element  input  and  1  or  0  for  the  I  element  input). 
A  bias  is  added  to  the  N  element  inputs  as  described  in 
Sec.  III.  Figure  15(b)  shows  the  I  element  outputs 
(right  side)  and  the  final  neuron  outputs  (left  side; 
these  are  also  the  N  element  outputs).  The  ideal 
result  is  the  residual  W  of  the  character  R  (right  top) 
and  the  full  character  T  (right  bottom)  in  the  N  ele¬ 
ment  area,  and  is  in  agreement  with  Fig.  15(b);  these 
regions  correspond  to  a  1  on  the  excitatory  inputs  and  a 
0  on  the  inhibitory  inputs. 

To  test  the  subtraction  linearity  for  the  gray  level 
case,  we  set  the  I  inputs  to  their  minimum  and  the  N 
inputs  to  a  small  but  nonzero  value  and  measure  the  N 
element  output.  For  every  increment  in  the  /  input, 
we  adjust  the  N  input  to  keep  the  N  element  output 
constant.  Figure  15(c)  is  a  plot  of  the  (normalized) 
resulting  linear  subtraction  of  the  N  inputs  from  / 
inputs.  Essentially,  it  is  equal  to  the  complementary 
plot  of  the  I  element  response.  Here  we  use  the  full 
operation  range  of  the  inverting  region  of  the  LCLV. 
If  we  limit  the  operation  of  the  I  inputs  to  be  50%  of 
02,  we  obtain  subtraction  that  is  very  close  to  linear. 
In  our  setup,  the  range  of  the  attenuated  I  output 
measured  at  the  N  element  input  is  ~0.2  #tW/cm2. 
With  the  current  setup,  the  nonlinearity  of  the  N  ele¬ 
ment  needsjto  have  relatively  high  differential  gain  to 
compensate  for  the  significant  loss  in  the  feedback  (/ 
-*  N)  path,  thus  the  use  of  self-feedback  for  the  N 
elements. 

VII.  Variant  of  the  ION  Model 

The  ION  model  emulates  a  biological  neuron  by 
using  spatial  coding  for  the  sign  of  the  input  signals. 
To  accommodate  some  of  the  artificial  neuron  models 
that  use  bipolar  neuron  outputs,  a  variant  of  the  ION 
model  is  proposed  which  uses  complementary  inputs 
and  weights.36  It  is  given  by 


LI  Mkl 


SF:  spatial  lifter 

LI -6:  lenses 
Ml -6:  mirrors 
PI -5  polarizers 

BSl-12:  beam  splitters 


Oet  1-2:  power  meters 
CAM  1-2:  CCD  cameras 
MK  1-4:  masks 
SI:  I  element  input 

SN.  N  element  input 


Fig.  12.  Experimental  setup  of  the  ION  test  cir¬ 
cuit 
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"  (a)  I  Element  Response 


Input  (pW/ cm2) 


Input  (pW/ cm2) 

Fig.  13.  LCLV  characteristics  of  the  /  and  N  element  in  the  test 
circuit  for  V  “  5.0  volts,  /  “  1.5  kHz.  The  vertical  axis  is  the 
intensity  measured  at  the  LCLV  output,  when  in  the  system  of  Fig. 
12  with  a  laser  power  of  200  mW.  The  /  element  is  fairly  linear 
within  50%  of  its  operation  range.  The  self-feedback  of  the  N 
element  (6)  is  necessary  to  satisfy  the  ION  requirement  for  this 
particular  device. 


The  terms  (1  -  V,)/2,  (1  +  Vj)/ 2,  (1  -  IV;,-)/ 2,  and  (1 
+  W,])/2  are  positive.  The  ( 1  +  V,)/2  and  ( 1  -  V,)/2 
terms  can  be  generated  in  some  complementary  de¬ 
vices  by  using  orthogonal  polarizations  (e.g.,  LCLV)  or 
by  reflected  and  transmitted  beams  (e.g.,  some  bi¬ 
stable  optical  devices);  this  makes  the  model  directly 
cascadable.  In  this  case,  the  input  to  the  net  must  be 
inverted,  and  the  signal  and  its  complement  are  pre¬ 
served  throughout  the  network.  No  other  /  elements 
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Fig.  14.  Experimental  results  of  a  single  neuron  showing  N  element 
output  (left  side  of  each  figure)  and  I  element  output  (center  pixel  of 
each  figure).  The  rightmost  pixel  is  used  for  alignment  The  fol¬ 
lowing  data  are  normalized:  (a)  /„c  *  1,  /j„h  •  0;  (b)  /«„  “  1,/jnh  *  1: 
(c)  /,, c  “  0,/inh  «  0;  (d)  /.«  -  0,/i„h  -  1;  gray  level  case,  with  (e)  /,«  - 
1,  /inh  “  and  (f)  /ex c  *  0.5,  /inh  3  0.5. 


are  necessary.  If  such  complementary  devices  are  not 
available,  the  neuron  input  and  output  can  be  left  in 
the  form  ( 1  +  V,)/2.  Then  an  /  element  can  be  used  at 
each  neuron  to  generate  (1  —  V;)/2  from  (1  +  V,)/ 2. 

In  this  model,  there  is  no  spatial  distinction  between 
excitatory  and  inhibitory  channels.  Restricting  the 
model  to  fully  connected  networks  [as  in  Eq.  (15)) 
keeps  the  threshold  neuron  independent;  this  yields 
the  simplest  hardware  implementation.  Alternative¬ 
ly,  by  summing  in  Eq.  (15)  only  over  the  inputs  to  each 
neuron,  a  partially  connected  network  can  be  imple¬ 
mented,  at  the  expense  of  an  increase  in  hardware 
complexity  because  of  the  neuron  dependent  thresh¬ 
old. 

For  example,  an  Amari  net12  uses  binary  bipolar 
neurons  and  its  retrieval  operation  is  given  by 


H  V  w>,v, 


(16) 


where  P,  e  |+1,-1|  is  the  output  state  and  Wt,  e  (-1,1) 
is  the  normalized  weight  from  neuron  j  to  neuron  i. 
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Fig.  15.  Results  of  binary  subtraction  with  (a)  input  patterns,  N 
inputs  (left)  and  /  inputs  (right),  at  LCLV  input;  and  (b)  outputs,  N 
element  outputs,  the  subtraction  result  (left),  and  /  element  outputs 
(right).  The  ideal  result  is  the  residual  leg  of  R  (top  right)  and  full  T 
(bottom  right)  of  the  N  element  output,  (c)  Gray-level  subtraction 
showing  normalized  N  input  vs  I  input  for  a  constant  N output  The 
subtraction  is  quite  linear  if  we  only  use  50%  of  the  maximum  /  input 

(<Jj). 


The  nonlinear  output  function  $(x)  is  equal  to  1  for  x  > 
0,  otherwise  it  is  —  1.  The  net  is  fully  connected.  Our 
complementary  model  can  implement  this  directly. 

VU1.  Application  Examples 

As  an  example  of  the  use  of  the  original  ION  model 
in  a  neural  net,  a  conceptual  diagram  of  an  implemen¬ 
tation  of  a  single  layer  feedback  net  is  shown  in  Fig.  16. 
It  utilizes  a  single  2-D  spatial  light  modulator  for  both  7 
and  N  elements.  The  output  of  the  7  element  is  im¬ 
aged  onto  the  input  of  the  N  element,  after  passing 
through  a  ND  filter  as  the  (uniform)  attenuation.  A 
uniform  bias  beam  is  also  input  to  the  N  element  The 
N  element  output  is  fed  back  through  an  interconnec¬ 
tion  hologram  to  the  inputs  of  both  7  and  N  elements, 
representing  inhibitory  and  excitatory  lateral  connec¬ 
tions,  respectively. 

Cooperative  and  competitive  interactions37>38  are 
two  main  neural  mechanisms  of  information  process¬ 
ing  in  the  human  brain.  The  macroscopic  behavior  of 
the  neural  network  exhibits  a  cooperative  property  but 
it  may  locally  execute  competitive  operations.  Struc¬ 
turally,  these  two  mechanisms  are  comprised  of  inter¬ 
actions  of  excitatory  and  inhibitory  signals  through 
feedforward,  lateral,  and  feedback  connections  in  the 
neural  network.  Heretofore,  the  discussion  has  con¬ 
sidered  ably  conventional  inner-product  neurons, 
which  perform  a  weighted  sum  of  their  input  signals. 
The  ION  concept  can  also  be  applied  to  other  types  of 
neuron,  for  example,  that  based  on  mass  action. 
These  neurons  use  a  mass  action  law  to  model  neuron 
behavior,26  which  tends  to  cause  competition  for  the 
limited  membrane  sites.  The  following  will  briefly 
discuss  two  types  of  neural  network  models,  those  of 
Fukushima  and  Grossberg,  and  their  implementation 
using  ION. 

Competitive  neural  networks  can  be  used  in  feature 
extraction,  pattern  recognition,  and  associative  memo¬ 
ry  14-17:19-23  Fukushima’s  neocognitron  is  a  multilayer 
feedforward  neural  network  used  for  pattern  recogni¬ 
tion.  His  more  recent  models  are  bidirectional  and 
can  serve  as  feature  extractor,  pattern  recognizer,  and 
associative  memory. 15-18  The  interaction  relationship 
of  his  models  can  be  written  as 


Interconnection 

Hologram 


Fig.  16,  Single  layer  feedback  net  using  a  single 
spatial  light  modulator  to  implement  both  l  and  N 
elements. 
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H  2  -«vo+  2  -^-4  2  *»v> 

k€  A^  ,tArm 

-t>i  2  a.*v*  -  2  a^v4  ’  (17) 

If  A,  J 

where  tilded  weights  (2iy,  2,*,  and  2,/)  are  fixed  weights 
with  a  Gaussian  distribution  with  respect  to  the  dis¬ 
tance  between  the  current  cell  and  the  input  cell. 
Terms  Apre,  Apo*t,  and  A/  denote  the  interconnection 
region  from  the  previous  layer,  postlayer,  and  lateral 
inhibition  area,  respectively.  The  other  weights  (a,;, 
a,*,  bi,  and  cf,)  are  modifiable  subject  to  winner  take  all 
learning,  i.e.,  within  a  specified  region,  only  the  neuron 
with  the  strongest  output  can  modify  its  weights.  The 
first  two  terms  in  Eq.  (17)  are  excitatory  inputs  from 
previous  and  postlayers,  and  the  third  and  fourth 
terms  are  used  to  provide  adaptive  level  control.  The 
last  term  in  Eq.  (17)  is  the  lateral  inhibition  (i.e.,  in¬ 
hibitory  connections  within  the  layer).  The  ION  mod¬ 
el  can  implement  these  by  putting  the  last  three  terms 
into  the  /  element,  with  the  first  two  terms  going  di¬ 
rectly  to  the  N  element.  Typically,  only  a  subset  of 
these  terms  is  present  in  any  one  of  Fukushima’s  mod¬ 
els;  for  example,  the  cognitron  and  neocognitron  mod¬ 
els11-14  use  only  the  first,  third,  and  fifth  terms,  while 
his  hierarchical  associative  memory16  uses  all  but  the 
last  term. 

Another  cooperative  competitive  type  of  neural  net¬ 
work  is  Grossberg’s  on-center  off-surround  enhance¬ 
ment  and  adaptive  resonance  theory.16-23  Grossberg 
uses  a  mass  action  law  in  his  models,  which  can  be 
described  as 

x(  ■  Ax,-  ■+■  (B  —  x,)/(IC 

"i 

/.«-2*(x>)C*+/-  (18> 
it-, 

4,h  =  2  *(zj)du  +  J» 

i-i 

where  x,  is  the  membrane  potential  of  neuron  i,  and  Itxc 
and  /inh  denote  the  total  excitatory  and  inhibitory  in¬ 
puts  to  neuron  i.  Terms  if/(xj)  and  4>{xj)  are  the  output 
of  neuron  j  and  its  inhibitory  interneuron,  respective¬ 
ly,  and  are  sigmoid  functions  in  most  cases.  Terms  /; 
and  </,  are  the  total  excitatory  and  inhibitory  inputs 
from  other  layers.  Terms  A  and  B  are  the  decay  con¬ 
stant  and  maximum  membrane  potential,  respective¬ 
ly;  Cij  and  Dij  are  the  interconnection  weights  within 
this  layer.  The  above  equation  can  be  grouped  into 
two  terms  to  be  implemented  by  I  and  N  elements, 
respectively.  For  the  discrete  case,  we  can  rewrite  Eq. 
(18)  as 

x,(*  +  1)  -  x,(A)[l  -  (A  +  /efc  +  /,„„)]  +  fl/,,,.  (19) 

This  is  a  shunting  model.  The  crucial  difference 
from  an  implementation  point  of  view  is  the  product 
between  the  neuron  inputs  and  the  neuron  potential  in 


the  first  term  of  Eq.  (19).  To  implement  this  term,  an 
/  element  with  adaptive  gain  is  needed;  or  the  /  ele¬ 
ment  read  beam  can  be  modulated  by  x,  to  provide  the 
multiplication.  The  second  term  is  the  excitatory 
part,  which  can  be  fed  to  the  N  element  directly.  For 
the  lumped  case  of  Grossberg’s  model,  there  is  no  delay 
between  the  output  of  the  neuron  and  its  intemeurons. 
We  assume  that  the  output  characteristic  functions 
ip(‘)  and  </>(•)  are  the  same  to  simplify  the  complexity  of 
the  neuron.  For  the  steady  state  case,  the  membrane 
potential  is 


An  optical  implementation  of  pixel-by-pixel  divi¬ 
sion  has  been  shown  by  Efron  et  al.39;  here  we  can  use 
the  output  of  the  N  element  as  the  read  beam  of  the  / 
element  to  provide  the  required  division.  Figure  17 
shows  a  conceptual  diagram  to  implement  Grossberg’s 
cooperative  competitive  network.  In  the  figure,  we 
use  yp(x)  to  approximate  the  membrane  potential  x. 
(This  only  holds  for  output  characteristics  that  are 
nearly  linear.  If  this  is  not  the  case,  the  N  elements 
shown  are  linear,  and  an  additional  nonlinear  N  ele¬ 
ment  device  is  inserted  before  the  interconnection  ho¬ 
logram.)  A  Fig.  17,  masks  B  and  C  are  used  to  provide 
read  beams  for  the  N  and  I  elements,  respectively. 
Each  mask  is  in  an  image  plane  of  the  LCLV.  The 
read  beam  of  the  I  element  derives  from  the  N  element 
output  through  mask  C.  Here  we  only  show  the  exter¬ 
nal  input  and  lateral  on-center  off-surround  connec¬ 
tions,  which  are  realized  by  interconnection  unit  F. 
The  most  powerful  interconnection  unit  would  utilize 
holograms  in  a  volume  medium. 

For  the  unlumped  system,  which  is  more  common  in 
the  biological  neural  network,  the  inhibitory  signal 
comes  from  an  intemeuron  with  a  different  character¬ 
istic.  The  interaction  can  be  formulated  as 

-  t>iy,)D,j  +  J,  ; 

(21) 

y,  “  ~Ey,  +  2  XF if 

i 

where  yt  is  the  activation  state  (potential)  of  intemeu¬ 
ron  i,  which  receives  excitatory  signals  from  a  total  of 
n3  excitatory  neurons.  Term  F,j  represents  the  inter¬ 
connection  strength  from  neuron  j  to  neuron  i  and  Dt, 
is  the  weight  from  the  output  of  the  interneuron  to  its 
neighboring  excitatory  neurons.  For  the  full  neuron 
with  potential  x,(fc),  we  need  one  /  element  with  adap¬ 
tive  gain  and  one  linear  N  element.  The  interneuron, 
which  has  potential  yt(k),  is  implemented  by  one  N 
element  only. 


!  -Ax,  +  (B, 


"i 

:,  -  *.)  2  + 1< 
j 


IX.  Discussion  and  Conclusion 

A  general  model  for  optical  neuron  units  has  been 
discussed  to  perform  the  requisite  subtraction  optical¬ 
ly  in  incoherent  optical  neural  networks.  This  model 
uses  two  separate  responses  to  implement  an  optical 
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Fig.  17.  Conceptual  diagram  to 
implement  Groesberg’s  mass  ac¬ 
tion  type  neuron  based  on  the 
ION  model  utilizing  a  LCLV. 


neuron  with  dynamic  and  global  thresholding.  The 
ION  model  can  implement  analog  or  binary  non-nega¬ 
tive  neuron  unit  outputs,  excitatory  and  inhibitory 
neuron  unit  inputs,  and  can  be  used  in  fully  or  partially 
connected  neural  networks.  Implementation  of  con¬ 
ventional  inner  product  neuron  units  and  mass  action 
law  neuron  units  for  shunting  networks  using  the  ION 
concept  has  been  described. 

In  addition,  a  variant  of  the  ION  was  given  that  en¬ 
codes  signals  and  their  analog  complements  in  a  two- 
channel  system.  It  permits  bipolar  neuron  outputs. 
The  trade-off  is  an  increase  in  interconnection  network 
complexity  and  the  requirement  for  a  neuron  dependent 
threshold  for  the  case  of  partially  connected  networks. 

We  have  summarized  sources  of  noise  for  the  ION 
and  proposed  a  noise  model.  From  the  result  of  com¬ 
puter  simulations,  it  seems  that  the  example  network 
performs  much  better  for  quickly  varying  (i.e.,  tempo¬ 
rally  uncorrelated)  noise  than  for  temporally  correlat¬ 
ed  (more  slowly  varying)  noise.  Due  to  the  static  input 
pattern  and  the  competitive  nature  of  the  network, 
once  the  noise  term  has  survived  a  number  of  itera¬ 
tions,  it  will  continue  to  get  stronger  and  will  not  die 
out.  For  other  neural  networks,  if  the  input  to  the 
network  is  time  varying,  slowly  varying  noise  is  effec¬ 
tively  an  offset  response  of  the  network  and  might  be 
adaptively  overcome  by  the  network,  while  quickly 
varying  noise  interacts  with  the  input  patterns  and  is 
generally  more  difficult  to  compensate. 

For  noise  that  is  correlated,  we  have  found  that  the 
qualitative  effect  of  each  of  the  three  noise  sources 
(additive  inhibitory,  multiplicative  inhibitory,  and  ad¬ 
ditive  excitatory)  on  the  output  of  the  net  is  essentially 
the  same.  Since  one  of  the  noise  terms  N is  the  same 
for  a  conventional  neuron  implementation  as  for  the 
ION,  it  appears  that  an  ION  implementation  is  not 
significantly  different  from  a  conventional  neuron  im¬ 
plementation  in  terms  of  immunity  to  noise  and  device 
imperfections  for  a  given  technology.  Apparently  the 
output  is  affected  primarily  by  the  variance  of  the 
noise  and  by  the  degree  of  spatial  and  temporal  corre¬ 


lation,  but  not  by  the  source  of  the  noise.  We  conjec¬ 
ture  that  this  result  may  not  be  peculiar  to  the  ION 
model  or  to  the  particular  net  that  we  simulated,  but  is 
likely  more  general. 

Two  approaches  have  been  described  to  implement 
the  ION  model;  homogeneous  (one  device)  and  hetero¬ 
geneous  (two  devices).  A  liquid  crystal  light  valve 
response  was  given  as  an  example  for  a  homogeneous 
implementation.  To  demonstrate  the  feasibility  of 
the  ION,  2-D  arrays  of  both  analog  and  binary  neuron 
units  were  experimentally  demonstrated  using  an 
LCLV  in  a  homogeneous  implementation,  successfully 
exhibiting  both  excitation  and  inhibition. 
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Appendix:  Operational  Analysis  of  the  ION  Model 

A.  Device  Requirements 

To  guarantee  the  proper  operation  of  the  ION  mod¬ 
el,  the  input  signals  and  the  device  characteristics 


must  satisfy  the  following  inequalities: 

0  <  Jinh  <  o„  (Al) 

0  <  l„c  <  o„  (A2) 

a  >  a,  (heterogeneous  case),  (A3) 

a  >  a,  +  o2  (homogeneous  case).  (A4) 


Let  l,  be  the  maximum  residual  output  of  the  device, 
i.e.,  the  worst-case  functional  zero  output,  and  ai  and  a 
are  defined  as  the  device  inputs  that  correspond  to  an 
output  of  Ir  (Fig.  4).  Terms  Ir,  a 2,  and  a  are  chosen  to 
provide  an  acceptable  extinction  ratio  at  the  same  time 
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as  sufficient  separation  between  the  I  and  N  elei  lent 
active  regions.  Equation  (Al)  results  from  the  limited 
domain  of  the  I  element.  Equation  (A3)  comes  from 
the  requirement  that  the  bias  beam  be  non-negative. 
For  the  homogeneous  case,  we  need  a  more  strict  con¬ 
straint  on  a,  to  prevent  the  N  element  from  operating 
in  the  inverting  (I)  region.  Since  the  smallest  possible 
input  to  the  N  element  is  l\™  =  (i-e.,  all  other  terms 
are  zero),  we  need  /biM  -  (a  —  a 0  >  a2,  which  gives  Eq. 
(A4).  In  the  case  of  the  inhibitory  signals  being  maxi¬ 
mum,  we  require  the  total  input  to  the  N  element  to 
satisfy: 

a2  £  f„c  +  (or  -  a,)  <  a.  (A5) 

The  upper  limit  in  Eq.  (A5)  is  not  necessary  for  all 
neuron  models;  we  include  it  here  so  that  even  with 
maximum  excitatory  input,  sufficient  inhibitory  input 
can  be  generated  to  set  the  neuron  output  to  the  zero 
state.  Combining  Eq.  (A5)  with  the  above  result  that 
a  >  a!  +  a2  and  with  the  non-negativity  constraint  on 
Iexc  yields  Eq.  (A2).  The  lower  bound  on  a  for  both 
cases  is  also  dependent  on  the  threshold  requirement 
as  shown  below. 


difference  due  to  a  state  change  of  the  N  element, 
respectively.  Let  the  fan-in  and  fan-out  be  Nln  and 
N0ut,  respectively;  then  the  effective  input  to  neuron 
unit  i  from  neuron  unit  j  will  be  wtJ  X  /,(0ut)/Nout. 
Therefore,  the  summed  input  to  one  neuron  unit  i  is: 


Min) 


Ir  ^  id,  £ 

"n~  Xw,'  +  rT  JLWiiV‘- 

iTout  jZ\  iy<xtt 


(A9) 


The  first  term  is  a  noise  term  due  to  the  residual  output 
of  the  optical  devices  and  the  second  term  is  the  de¬ 
sired  signal  term.  Two  different  requirements  on  the 
relative  sizes  of  these  two  terms  are  considered. 

For  networks  with  small  to  moderate  fan-in,  we  re¬ 
quire  the  signal  term  to  be  greater  than  the  noise  term. 
To  estimate  the  fan-in,  we  take  the  mean  of  the  above 
equation  on  both  sides.  Assuming  the  mean  value  of 
the  weight  to  be  Wj  and  considering  the  worst  case  of 
only  one  input  being  active  yield 


lr  _  id,  _ 
<W  ”  jy  NinWij  "*■  W> i 


(A10) 


The  fan-in  can  be  calculated  by  setting  the  signal  term 
greater  than  the  noise  term,  so 


B.  Threshold  Implementation 

We  make  the  reasonable  assumption  that  the 
threshold  9  has  the  same  maximum  variation  range  as 
the  inputsignals,  i.e.,  —  at  <  6  <  at.  Referring  to  Fig.  4, 
hits  =  a  —  ai  is  the  bias  point  for  zero  threshold.  To 
achieve  a  positive  threshold,  it  is  shifted  left  by  an 
amount  9  to  =  a  —  a\  —  9.  For  the  heterogeneous 
case,  must  be  positive,  i.e., 

a  >  a,  +  9  (heterogeneous  case)  (A6) 

and  9  is  in  the  range  0  <  0  <  av. 

To  prevent  the  N  element  from  operating  in  the 
inverting  region  for  the  homogeneous  case  (Fig.  4),  the 
new  bias  point  must  fulfill  the  more  strict  inequal¬ 

ity  (a  -  ai)  —  9  >  a2,  i.e., 

a  >  ot  +  a2  +  8  (homogeneous  case)  (A7) 

and  9  is  in  the  same  range. 

To  realize  a  negative  threshold,  the  bias  point  is 
shifted  right  by  an  amount  —9  to  bias  point  This 
relaxes  the  original  constraint  on  a  [Eqs.  (A3)  and 
(A4)],  so  no  new  constraints  are  needed.  A  time  vary¬ 
ing  global  theshold  can  be  implemented  by  varying  the 
bias  beam.  The  device  requirements  [Eqs.  (A6)  and 
(A7)]  then  reflect  the  maximum  positive  threshold 
expected,  0ma*.  If  no  other  limitations  are  expected  on 
9,  then  9  -  at  can  be  used. 


JVjn<yi.  (All) 

lr 

which  car/*be  taken  as  a  measure  of  the  extinction  ratio 
of  the  N  element. 

For  a  neural  network  with  large  fan-in,  we  require 
instead  that  a  constant  fraction  0  of  the  maximum 
signal  term  be  greater  than  the  noise  term  in  Eq.  (A9). 
In  this  case,  the  second  term  in  Eq.  (A10)  is  multiplied 
by  /JiVin,  and  this  yields  1/0  <  A IJIr.  Thus,  the  extinc¬ 
tion  ratio  of  the  device  gives  a  lower  bound  on  0  but  is 
independent  of  the  fan-in.  For  example,  in  many 
networks  a  1/0  ranging  from  10  to  100  may  be  sufficient 
for  the  optical  neuron  while  the  maximum  fan-in  may 
be  lOMO4. 

The  maximum  fan-out  can  be  calculated  by  consid¬ 
ering  the  input  to  a  neuron  unit  under  maximum  exci¬ 
tation.  This,  taking  the  mean  again,  yields 


(&)  -  jjr-  "./Ccl  +  p- 

XVftUt  4*OUl 


(A12) 


where  iV['Ic)  is  the  fan-in  to  the  excitatory  site.  The 
first  term  is  assumed  much  smaller  than  the  second 
and  will  be  neglected  hereafter.  We  require  the  total 
input  to  the  N  element  on  these  conditions  to  be  suffi¬ 
cient  to  drive  the  N  element  full  on 


C.  Fan-in  and  Fan-out 

We  will  assume  binary  neuron  units  during  calcula¬ 
tion  of  the  maximum  fan-in  and  fan-out  of  the  ION. 
As  shown  in  Fig.  3(c),  the  output  of  the  ;th  neuron  unit 
can  be  formulated  as 

l,iMI  -  l,  +  M,Vr  (A8) 

where  Vj  e  |0,1|  is  the  output  state  of  the  neuron  unit; 
and  Ir  and  A I,  are  the  residual  output  and  output 


a,  +  y,  — —  N\ntc)u>ii  +  (a  ~  a,)  >  AaN  +  a,  (A13) 

~out 

where  the  first  and  last  terms  on  the  left-hand  side 
represent  the  input  due  to  the  I  element  output  (under 
minimum  inhibition)  and  the  bias,  respectively. 
Term  ys  is  the  optical  system  loss  from  the  output  of 
one  neuron  unit  to  the  input  of  the  next,  and  A  ay  is  the 
differential  input  required  to  turn  the  N  element  full 
on  (Fig.  4).  Rearranging  yields: 
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N„ 


AA 

Aa  v 


(A14) 


Thus,  the  ratio  of  the  fan-out  to  the  excitatory  fan-in 
is  bounded  above  by  a  measure  of  the  differential  gain 
of  the  N  element,  scaled  by  loss  factors  ya  and  wfj- 
During  implementation  of  the  ION  model,  with  some 
optical  devices  A I,  can  be  controlled,  within  bounds, 
by  the  intensity  of  the  readout  beam;  also,  if  necessary 
self-feedback  can  be  employed  on  the  N  element. 
Both  methods  effectively  increase  the  differential  gain 
of  the  N  element  and  permit  a  larger  fan-out. 

The  residual  output  of  the  device  may  be  less  impor¬ 
tant  for  digital  optical  computing,  because  the  fan-in  is 
low  and  we  can  offset  the  holding  power  or  bias  point  to 
overcome  the  residual  output.  But  it  is  crucial  in 
optical  neural  networks.  This  is  because  the  residual 
term  will  be  multiplied  by  the  weights;  as  the  network 
learns  this  will  produce  a  time  varying  deterministic 
noise  term  in  the  summed  input.  If  the  noise  is  small 
and  the  weight  modification  varies  sufficiently  slowly, 
this  noise  term  may  not  affect  the  short  term  dynamics 
of  the  neural  network.  Ideally,  we  need  the  residual 
term  to  be  as  small  as  possible. 

We  also  note  that  the  device  requirements  given  in 
Eqs.  (Al)  and  (A2)  can  now  be  rewritten  in  terms  of 
known  quantities  as: 


^inM 

^•aT- Sa' 


A/,  — 


(A15) 


where  is  the  fan-in  to  the  inhibitory  site  and  the 
worst-case  assumption  of  all  weights  being  equal  to  one 
has  been  employed. 
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ABSTRACT 

Abstract  parallel  computational  models  are  discussed 
and  related  to  optical  computing.  Two  classes  of  parallel 
computing  models,  shared  memory  and  graph/network,  are 
used  to  analyze  some  of  the  possible  effects  of  optical  technol¬ 
ogy  on  parallel  computing.  It  is  found  that  the  use  of  optics 
potentially  provides  certain  fundamental  advantages  in  com¬ 
munication  and  implementation  of  the  architectures  based  on 
these  models.  In  addition,  some  factors  that  limit  the  com¬ 
munication  capabilities  of  optical  systems  for  network  models 
are  discussed. 


INTRODUCTION 

In  this  paper  we  look  at  computational  models  for  paral¬ 
lel  processing  in  an  attempt  to  increase  our  understanding  of 
the  role  that  optical  computing  might  play.  Most  of  the 
parallel  architectures  discussed  in  the  parallel  processing  com¬ 
munity  are  heavily  influenced  by  the  constraints  of  electronic 
systems.  The  purpose  of  our  approach  in  this  paper  is  to 
abstract  the  notion  of  parallel  computing  from  the  limitations 
of  any  given  technology.  This  abstract  model  can  then  be 
ised  as  a  starting  point  for  the  design  of  parallel  optical  com¬ 
puting  architectures.  In  the  process,  some  of  the  consequences 
a  inherent  differences  between  optical  and  electronic  systems 
itart  to  become  apparent. 

Computing  paradigms  are  important  for  understanding 
the  level  and  class  of  problems  that  the  computer  scientist  is 
addressing.  Consider  the  following  structural  paradigmatic 
classification:  physical,  functional,  computational.  A  represen¬ 
tation  and  example  of  each  of  these  paradigms  is  illustrated  in 
Table  1.  Here  we  are  only  concerned  with  the  computational 
paradigm  and  the  optical  implications. 


Before  discussing  computational  models,  both  sequential 
and  parallel,  we  define  computational  order  or  complexity  as 
it  is  used  in  this  paper.  For  a  problem  of  size  n  .  the  order  oi 
its  time  (or  space)  complexity  is  denoted  by  0(f  (n  )),  which 
is  defined  such  that  /  (n  )  is  the  asymptotic  behavior  of  that 
problem  as  n  grows  very  large.  0(f(n))  represents  an 
asymptotic  upper  bound,  and  fi(g  (n )),  defined  similarly, 
represents  an  asymptotic  lower  bound.  This  notation  will  be 
used  throughout  tlfk  '  paper.  For  formal  definitions  see 
(Gottlieb  and  Kruskal.  198-4). 

Computational  models  are  important  because  they  meas¬ 
ure  the  performance  of  general  classes  of  both  sequential  and 
parallel  algorithms  on  an  idealized  abstract  machine.  How¬ 
ever,  the  performance  of  these  models  is  highly  dependent  on 
the  class  of  algorithms.  If  the  generic  class  of  algorithms  is 
known  for  a  specific  problem  (e.g.  the  communication  algo¬ 
rithms  of  broadcasting,  reporting,  sorting,  etc.),  then  the  com¬ 
putational  model  which  efficiently  runs  these  algorithms 
would  be  a  starting  point  for  the  design  of  a  computer  archi¬ 
tecture  that  would  do  the  same.  The  basic  assumption  is 
that  algorithms  which  run  well  on  a  computational  model 
should  run  well  on  the  model-derived  architecture.  Our  inten¬ 
tion  is  to  show  that  optics  has  a  greater  potential  than  elec¬ 
tronics  for  physically  realizing  key  aspects  of  some  of  these 
computational  models. 

SEQUENTIAL  COMPUTATIONAL  MODELS 

Since  parallel  computational  models  are  for  the  most 
part  extensions  of  sequential  models,  we  briefly  discuss 
these  sequential  machines  with  particular  emphasis  on  the 
Random  Access  Machine.  The  most  well-known  and  basic  of 
the  sequential  machines  is  the  Turing  machine  (TM).  The 
universal  TM  has  the  capability  of  computing  any  algorithm 
Chat  is  computable  (a  rather  circular  thesis  since  a  universal 


Table  1.  Processing  paradigm  levels. 


|  PARADIGMS 

REPRESENTATION 

EXAMPLE 

Physical 

Hardware/Technology 

IC,  Board 

Functional 

Architecture 

PE,  Memory, 
Interconnection  Topology 

Computational 

Algorithms /Metrics 

Turing  Machine. 
Automata.  Random 
Access  Machine 
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TM  defines  what  is  computable).  A  principal  application  of 
the  TM  is  in  determining  lower  bounds  on  the  space  or  time 
necessary  to  solve  algorithmic  problems.  The  TM  is  a 
well-known  computational  model;  for  further  interest  see  the 
very  informative  text  by  Minsky  (1967). 

The  Random  Access  Machine  (RAM)  (Aho,  Hopcroft, 
and  Ullman,  1974)  is  a  less  primitive  computational  model 
which  can  be  stylized  as  a  primitive  computer.  The  RAM 
model  is  a  one-accumulator  computer  in  which  the  instruc¬ 
tions  are  not  allowed  to  modify  themselves.  A  RAM  consists 
of  a  read-only  input  tape,  a  write-only  output  tape,  a  pro¬ 
gram  and  a  memory.  The  time  on  the  RAM  is  bounded 
-i^-ove  by  a  polynomial  function  of  time  on  the  TM.  In  partic¬ 
ular.  for  a  TM  of  time  complexity  T(n)>n,  a  RAM  can 
simulate  the  TM  in  O  (  T  (n  ))  or  0  ( T  (n  )togn  )  time,  depend¬ 
ing  on  the  cost  function  used  for  the  RAM.  For  the  converse, 
using  a  TM  to  simulate  a  RAM.  the  bounds  os  time  required 
by  the  TM  are  higher  and  are  highly  dependent  on  the  RAM 
:ost  function  used  (Aho.  Hopcroft,  and  Ullman,  1974).  The 

program  of  a  RAM  is  not  stored  in  memory  and  is 
•mmodifiable.  The  RAM  instruction  set  is  is  small  and  con¬ 
sists  of  operations  such  as  store,  add,  subtract,  and  jump  if 
greater  than  zero;  indirect  addresses  are  permitted.  A  com¬ 
mon  RAM  model  is  the  uniform  cost  one,  which  assumes  that 
each  RAM  instruction  requires  one  unit  of  time  and  each 
register  one  unit  of  space.  Attempts  to  parallelize  the  RAM 
computational  model  resulted  in  many  of  the  parallel  compu¬ 
tational  models. 

SHARED  MEMORY  MODELS 

We  will  discuss  only  two  classes  of  parallel  computa¬ 
tional  models;  shared-memory  models  and  graph/network 
models.  As  might  be  inferred  from  the  shared  memory  term, 
these  models  are  based  on  global  memories  and  are 
differentiated  by  their  accessibility  to  memory,  (n  Fig.  I  we 
see  a  typical  shared  memory  model  where  individual  process¬ 
ing  elements  (PE’s)  have  variable  simultaneous  access  to  an 
individual  memory  cell.  (A  processing  element  is  a  physically 
isolated  computational  unit  consisting  of  some  local  memory 
and  computational  power.  A  PE  can  be  construed  as  a  com¬ 
putational  primitive  from  which  more  sophisticated  architec¬ 
tures  can  be  constructed  (Hwang  and  Briggs,  1984)).  Each 
PE  can  access  any  cell  of  the  global  memory  in  unit  time.  In 
addition,  many  PE’s  can  access  many  different  cells  of  the  glo¬ 
bal  memory  simultaneously.  In  the  models  we  discuss,  each 
PE  is  a  slightly  modified  RAM  without  the  input  and  output 
tapes,  and  with  a  modified  instruction  set  to  permit  access  to 
the  global  memory.  A  separate  input  for  the  machine  is  pro¬ 
vided.  A  given  processor  can  generally  not  access  the  local 
memory  of  other  processors. 

The  models  differ  primarily  in  whether  they  allow  simul¬ 
taneous  reads  and/or  writes  to  the  same  memory  cell.  The 
PRAC,  parallel  random  access  computer  (Lev,  Pippenger  and 
Valiant,  1981)  does  not  allow  simultaneous  reading  or  writing 
to  aa  individual  memory  cell.  The  PRAM,  parallel  random 
access  machine,  (Fortune  and  Wyllie,  1978)  permits  simul¬ 
taneous  reads  but  not  simultaneous  writes  to  aa  individual 
memory  cell.  The  WRAM,  parallel  write  random  access 
machine,  denotes  a  variety  of  models  that  permit  simultane¬ 
ous  reads  aad  certain  writes,  but  differ  in  how  the  write 
conflicts  are  resolved  For  example,  a  model  bv  Shiloach  and 
Vishkin  (19811  allows  a  simultaneous  write  only  if  all  proces- 
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Fig.  1.  Conceptual  diagram  of  shared  memory  models. 

sors  are  trying  to  write  the  same  value.  The  paracomputer 
(Schwr.-tz,  1980)  has  simultaneous  writes  but  only  “some"  of 
all  the  information  written  to  the  cell  is  recorded.  The 
models  represent  a  hierarchy  of  time  complexity  given  by 
2-prac>  j-PRaM>  j- WRaM 

where  T  is  the  minfnum  number  of  parallel  time  steps 
required  to  execute  an  algorithm  on  each  model.  More 
detailed  comparisons  are  dependent  on  the  algorithm  (Borodin 
and  Hopcroft,  1985). 

Implications  of  optica 

In  general,  none  of  these  shared  memory  are  physically 
realizable  because  of  actual  fan-in  limitations.  Optical  inter¬ 
connections  permit  greater  fan-in  than  electronic  systems.  In 
additioa,  the  non-interacting  property  of  photons  in  a  linear 
medium  (versus  the  mutual  interaction  of  electrons)  may  per¬ 
mit  simultaneous  memory  reads  much  more  easily.  As  an 
electronic  example,  the  ultracomputer  (Schwartz.  1980)  is  an 
architectural  manifestation  of  the  paracomputer  that  uses  a 
hardwired  Omega  network  between  tEe  PE’s  and  memories;  it 
simulates  the  paracomputer  within  'a  time  peoalty  of 
0( log^n  ). 

Optical  systems  could  in  principle  be  used  to  implement 
this  parallel  memory  read  capability.  .As  a  simple  example,  a 
single  1-bit  memory  cell  caa  be  represented  by  one  pixel  of  a 
1-0  or  2-D  array;  the  bit  could  be  represented  by  the  state 
(opaque  or  transparent)  of  the  mfcmory  cell.  Many  optical 
beams  can  simultaneously  read  the  contents  of  this  memory 
cell  without  contention  (Fig.  2).  In  addition  to  this  an  inter¬ 
connection  network  is  needed  between  the  PE’s  and  the 
memory,  that  can  allow  any  PE  to  communicate  with  any 
memory  cell,  preferably  in  one  step,  and  with  no  contention. 
A  regular  crossbar  is  not  sufficient  for  this  because  fan-in  to  a 
given  memory  cell  must  be  allowed.  Figure  3  shows  a  concep¬ 
tual  block  diagram  of  a  system  based  on  the  PRAM  model; 
here  the  memory  array  operates  in  reflection  instead  of 
transmission.  The  fan-in  required  of  the  interconnection  net¬ 
work  is  also  depicted  in  the  figure. 

Optical  systems  can  potentially  implement  crossbars  that 
also  allow  this  faa-in.  Several  optical  crossbar  designs  dis¬ 
cussed  in  (Sawchuk.  et  ml.,  1986)  exhibit  fan-in  capability.  An 
example  is  the  optical  crossbar  shown  schematically  in  Fig.  4. 
The  1-D  array  on  the  left  could  be  optical  sources  (LED’s  or 
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Fig.  2  One  memory  cell  of  an  array,  showing  multiple  optical 
beams  providing  contention-free  read  access. 


Fig.  I  Block  diagram  of  an  optical  architecture  based  on 
parallel  RAM  models. 


Fig  a  Example  of  an 


optical  crossbar 


interconnection  net¬ 


work. 


laser  diodes)  or  just  the  location  of  optical  signals  entering 
from  previous  components.  An  optical  system  spreads  the 
light  from  each  input  source  into  a  vertical  column  that 
illuminates  the  crossbar  mask.  Following  the  crossbar  mask, 
a  set  of  optics  collects  the  light  transmitted  by  each  row  of 
the  mask  onto  one  element  of  the  output  array.  The  states  of 
the  pixels  in  the  crossbar  mask  (transparent  or  opaque)  deter¬ 
mine  the  state  of  the  crossbar  switch  Multiple  transparent 
pixels  in  a  column  provide  fanout;  multiple  transparent  pixels 
in' a  row  provide  fan-in.  Many  optical  recon figurable  network 
designs  are  possible,  and  provide  tradeoffs  in  performance 
parameters  such  as  bandwidth,  reconfiguration  time,  max¬ 
imum  number  of  lines,  hardware  requirements,  etc. 
Unfortunately,  most  simple  optical  crossbars  will  be  limited  in 
sire  to  approximately  258  x  256  (Sawchuk,  et  al.,  1986).  We 
ire  currently  considering  variants  of  this  technique  to  increase 


the  number  of  elements.  Possibilities  include  using  a  multis¬ 
tage  but  nonblocking  interconnection  network  (e  g.  Clos),  a 
hierarchy  of  crossbars,  and/or  a  memory  hierarchy. 

GRAPH/NETWORK  MODELS 

Graph/ network  models  are  characterized  by  a  collection 
of  usually  identical  PE’s  that  are  interconnected  with  a  fixed 
network.  They  can  be  represented  by  graphs,  with  a  node  of 
the  graph  for  each  PE  and  an  are  or  link  of  the  graph  for 
each  PE  to  PE  interconnection.  The  modeb  differ  from  one 
another  in  the  length  of  time  required  for  a  message  to 
traverse  one  arc  of  the  graph,  and  on  the  assumptions  placed 
on  the  PE's  such  as  their  ability  to  respond  to  multiple  mes¬ 
sages.  The  feasibility  of  implementation  of  these  modeb 
depends  on  the  connectivity  of  the  graph;  if  the  connectivity 
b  not  too  high,  the  model  b  much  more  readily  implemented 
than  the  shared  memory  modeb. 

Network  modeb  can  be  compared  to  shared  memory 
modeb.  Any  of  the  shared  memory  modeb  can  efficiently 
simulate  (in  0(1)  time)  a  network  model.  Thb  b  done  by 
dedicating  a  different  cell  of  the  global  memory  for  each  link 
of  the  network.  One  PE  sends  a  message  to  another  by  writ¬ 
ing  the  message  to  a  memory  cell  which  the  other  PE  then 
reads.  Conversely,  suppose  the  network  model  b  capable  of 
(partial)  routing  in  r(n  )  time.  Then  it  can  simulate  one  step 
of  the  PRAC,  PRAM<for  WRAM  in  0(r(n))  time  (Borodin 
and  Hopcroft,  1985). 

In  a  highly  parallel  machine  communications  are  exceed¬ 
ingly  important  and  for  many  tasks  can  dominate  the  execu-'- 
tion  time  of  the  algorithm.  We  therefore  concentrate  on  com¬ 
munications  in  our  analysb  of  these  modeb.  We  will  use  com¬ 
munication  algorithms  for  our  analysb  since  they  have,  been 
found  to  be  reasonable  predictors  of  performance  (Levitan. 
1985). 

PE  Complexity  and  Communications 

Since  the  performance  of  network  modeb  depends  on  the 
assumptions  on  the  individual  processing  elements,  we  need  to 
consider  these  assumptions  and  thety  relationships  to  com¬ 
munication  tasks.  We  will  show  that  in  general  the  communi¬ 
cations  between  PE's  (or  the  network  topology)  cannot  be 
completely  decoupled  from  the  hardware  complexity  of  the 
PE's  themselves.  After  giving  a  relationship  between  PE  space 
complexity  and  interconnection  capability,  we  will  be  able  to 
identify  what  reasonable  assumptions  on  the  PE  complexity 
are  for  the  optics  and  electronics  cases.  These  assumptions 
will  be  used  in  assessing  the  performance  of  different  commun¬ 
ication  tasks  on  network  modeb.  In  thb  paper  the  term  PE 
complexity  refers  to  the  space  complexity  of  each  PE.  We 
will  not  discuss  time  complexity  of  individual  PE’s. 

For  simplicity,  we  will  assume  the  bandwidth  of  each 
I/O  line  to  a  PE  is  fixed  and  is  given.  Thus  we  are  a  priori 
not  considering  one  of  the  potential  advantages  of  an  optical 
system  over  an  electronic  one.  We  will,  however,  consider  the 
effect  of  the  number  of  I/O  lines  to  a  PE.  The  complexity  of 
each  PE  must  grow  at  least  as  fast  as  the  number  of  I/O  lines 
to  the  PE.  Thus  a  lower  I  ound  on  the  PE  complexity  is 
f 1(d),  where  d  is  the  number  of  I/O  lines  to  the  PE.  This 
lower  bound  applies  even  in  the  simplest  case,  in  which  all 
input  lines  are  combined  (e.g.  by  a  logic  operation),  aod  the 
output  lines  are  not  allowed  to  carry  dbtinct  signab  in  a  sin¬ 
gle  time  step.  If  signab  on  different  input  lines  must  be  kept 
distinct  yet  ire  accepted  in  a  single  time  step,  or  if  distinct 


signals  caa  be  put  on  multiple  output  lines  in  a  single  see" 
•hen  the  PE  time  or  spice  complexity  will  be  higher. 

Implications  of  this  lie  in  the  communication  ability  of 
PE  networks,  particularly  in  the  optics  case.  With  electronic 
technology,  the  number  of  I/O  lines  to  a  PE  is  generally  quite 
limited  and  this  limits  the  ability  of  the  PE’s  to  communicate. 
This  is  due  to  limited  pinout,  cost  of  interconnections,  etc. 
The  PE  complexity  is  in  practice  not  an  issue  for  communica¬ 
tions.  In  the  optics  case,  however,  there  are  no  pinout  restric¬ 
tions  and  many  parallel  interconnection  lines  are  feasible. 
However,  there  are  limitations  on  the  total  number  of  inter¬ 
connections  in  an  optical  PE  network;  these  are  due  to  the  PE 
complexity  itself.  In  other  words,  the  PE’s  have  to  be  able  to 
accommodate  all  of  the  I/O  lines.  The  optics  case  apparently 
allows  a  balance  between  the  interconnections  and  the  PE 
complexity;  in  the  electronics  case  the  interconnections  are 
further  limited  by  technology  factors. 

Communication  Time  and  Space  of  Network  Models 

In  this  section  ive  will  give  the  time  required  to  execute 
different  communication  tasks  on  different  network  topologies, 
and  values  of  some  measures  of  space  complexity  required  in 
implementing  different  network  topologies  in  VLSI  and  optics. 
We  are  concerned  with  fine-grained  systems,  that  is  systems 
with  a  large  or  very  large  number  of  relatively  simple  PE's. 
.Vs  a  minimum,  we  assume  each  PE  can  store  its  own  address 
so  that  it  knows  where  it  is  located.  Many  algorithms  can 
become  quite  difficult  without  this  feature.  This  implies  that 
•.he  PE  complexity  must  be  allowed  to  grow  fl(logiV). 

(n  an  electronic  system,  the  number  and  length  of  inter¬ 
connections  is  important  and  ideally  should  be  minimized, 
’'he  number  of  connections  to  each  PE  or  node  of  the  graph 
is  limited  to  small  values  due  to  I/O  constraints.  This  limits 
the  connectivity  of  graphs  that  can  be  efficiently  imple¬ 
mented.  The  degree  of  a  graph  is  the  number  lines  connected 
to  each  node.  Electronic  systems  limit  the  degree  of  the 
graph  to  a  relatively  small  value;  for  large  enough  .V  the 
degree  must  be  a  constant,  independent  of  S 

Optical  systems  have  no  I/O  restrictions  on  the  PE's  per 
;e.  but  as  discussed  above  the  degree  of  the  graph  will  be  lim¬ 
ited  by  the  complexity  of  the  PE's.  Since  the  Pb  complexity 
is  n(logiV)  anyway,  in  the  optics  case  the  degree  of  the  graph 
can  easily  be  O(logrV).  Larger  degrees,  e  g.  0{NX'*),  where 
p  >2  may  also  be  feasible. 

In  order  to  calculate  communication  times  on  a  network 
model,  certain  assumptions  need  to  be  specified.  We  assume 
that  all  messages  are  the  same  size  and  are  routed  to  their 
destinations  over  the  fixed  connection  network  by  passing 
over  links  and  through  PE’s,  One  time  step  is  defined  as  the 
time  for  a  PE  to  send  a  message,  the  message  to  travel  over 
one  link,  be  received  by  the  PE  at  the  end  of  the  link,  and  for 
the  PE  to  perform  any  computation  on  the  message  (such  as 
altering  its  tag  or  combining  messages  that  arrive  simultane¬ 
ously).  The  processors,  operate  synchronously.  Finally,  the 
number  of  messages  that  can  simultaneously  be  accepted  or 
output  by  each  PE  must  be  considered.  In  the  electronics 
case,  the  number  of  messages  that  can  be  simultaneously 
accepted  by  a  PE  is  relatively  small  (because  of  the  degree 
limitation),  and  will  probably  need  to  be  a  constant  indepen¬ 
dent  of  .V  (Kushner  and  Roaenfeld.  1983).  For  simplicity  this 
can  be  taken  to  be  1.  A  PE  can  output  identical  copies  of  the 
iame  message,  but  not  multiple  messages.  For  the  optics 
use.  we  assume  onlv  a  limit  on  the  PE  complexity;  this  then 


dictates  how  flexible  the  inputs  and  outputs  of  the  PE  can  be. 
We  limit  the  PE  complexity  to  the  degree  of  the  network  or 
logrV,  whichever  is  greater  Each  PE  can  accept  d  simultane¬ 
ous  messages,  where  d  is  the  degree  of  the  network,  and  may 
increase  with  .V.  Each  PE  can  output  d  identical  messages 
simultaneously;  outputing  different  messages  simultaneously 
(in  conjunction  with  inputing  several  messages  simultane¬ 
ously)  can  involve  an  increase  in  PE  complexity,  depending  on 
what  the  PE  is  required  to  do. 

Kushner  and  Roaenfeld  (1983)  classify  communication 
tasks  as  one-to-many,  many-to-one,  and  one-to-one.  One  to 
many  tasks  include  broadcasting,  in  which  one  PE  (the  root  if 
there  is  a  node  so  distinguished)  sends  the  same  message  to 
many  (or  all)  other  PE’s:  in  general  the  messages  may  be 
altered  as  they  travel  to  their  destinations.  Many  to  one 
tasks  must  be  divided  into  two  classes.  In  both  classes  many 
PE’s  alt  send  messages  to  the  same  PE  (root).  In  one  case. 
condensing,  the  messages  can  be  combined  (e  g.  added)  in 
route  to  the  destination;  in  the  case  of  funneling,  the  messages 
must  be  kept  separate.  One  to  one  tasks  are  permutations,  in 
which  each  PE  sends  a  message  to  one  other  PE.  In  the 
worst  case,  half  the  PE’s  send  messages  to  toe  other  half,  each 
message  with  a  diff-rent  destination  PE. 

Optical  fixed  interconnections  can  be  implemented  using 
lenses,  free  space  propagation,  and  computer-generated  holo¬ 
grams.  Let  ,V  be  the  number  of  nodes  being  interconnected. 
In  an  optical  holographic  interconnection  system,  a  2-D  array 
of  nodes  is  connected  to  a  2-D  array  of  nodes.  A  set  /  of 
interconnection  patterns  can  be  defined;  each  interconnection 
pattern  can  be  represented  by  an  ordered  pair  representing 
the  location  or  address  of  the  destination  of  the  destination 
node  in  the  array  relative  to  tee  location  of  the  source  node 
(multiple  ordered  pairs  can  be  used  to  accommodate  fanout). 
In  one  scenario,  all  interconnections  that  are  implemented 
must  be  in  /.  Let  ,Vf  be  the  number  of  distinct  interconnec¬ 
tion  patterns  that  are  used.  The  primary  limiting  factor  on 
the  number  and  types  of  interconnections  that  can  be  imple¬ 
mented  in  the  optics  case  is  given  by  an  upper  limit  on  the 
product  MN  (Jenkins,  et  al.,  1984).  This  is  proportional  to 
the  number  of  resolvable  elements  that  the  hologram(s)  must 
contain.  Current  hologram  plotting  devices  give  the  approxi¬ 
mate  limit  MN  <4X  10s  MN  is  a  measure  of  the  space  com¬ 
plexity  or  hologTam  area  of  interconnections  implemented 
with  this  optical  technique.  Figure  >  shows  an  optical  system 


Fig.  5  Optical  system  for  implementing  fixed  interconnec¬ 
tions 

that  can  implement  fixed  interconnections:  it  connects  a  2-D 
array  to  a  2-D  array  The  holograms  provide  the  beam  split¬ 
ting  (fanoutl  and  beam  steering  operations  Each  input  node 
is  Tansferred  to  a  corresponding  pixel  of  the  first  hologram 
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The  second  hologram  stores  the  different  interconnection  pat¬ 
terns.  and  the  first  hologram  addresses  the  appropriate  inter¬ 
connection  patternfs)-for  each  node.  Each  stored  interconnec¬ 
tion  pattern  ct.i  be  used  by  multiple  nodes  simultaneously 
This  system  ar  be  used  to  implement  any  of  the  network 
topologies  discuss’d  in  this  section. 

The  worst  .  se  order  of  magnitude  communication  time 
for  several  networks  of  different  topologies  and  degrees  is 
given  in  Table  2.  The  array  and  binary  tree  take  the  same 
time  under  our  optics  assumptions  as  under  our  electronics 
assumptions.  The  optics  assumptions  allow  degrees  that  are  a 
function  of  :V,  which  fu'ther  reduce  the  time  of  communicar 
tion  tasks  The  fully  interconnected  array  is  impractical  for 
large  :v  .n  both  optical  .uid  electronic  cases;  however,  the 
upper  umit  on  /V  that  is  feasible  is  probably  higher  in  the 
optics  case  than  in  electronics.  Table  2  also  gives  asymptotic 
upper  bounds  for  the  optical  interconnection  space  complexity 
I.Vf/V),  and  asymptotic  lower  bounds  for  area  in  VLSI.  A 
common  grid  model  was  used  for  the  VLSI  area  complexity 
■L'llman,  J.  D..  1984).  While  optical  and  electronic  technolo¬ 
gies  are  different,  for  large  .V  differences  in  complexity  can 
outweigh  differences  in  technology. 

CONCLUSIONS 

Parallel  computational  models  offer  a  fundamental  basis 
lor  understanding  parallel  computation  and  abstract  out  the 
limitations  posed  by  a  particular  technology.  Actual  parallel 
computer  architectures  designed  with  these  models  in  mind 
should  exhibit  performance  similar  to  the  parallel  computa¬ 
tional  models.  We  have  found  some  potential  advantages  of 
implementing  these  parallel  models  in  optical  computing 
architectures;  advantages  such  as  better  conteation-free  access 
to  global  memories,  associated  reconfigurable  interconnection 
networks,  and  implementation  of  increased-degree  PE  net¬ 
works.  We  also  pointed  out  that  the  connectivity  of  optical 
systems  is  not  unlimited,  but  limited  by  the  complexity  of  the 
components  that  are  being  connected.  Finally,  based  on  these 
factors,  we  conjecture  that  optics  provides  the  potential  for 
allowing  a  closer  realization  of  linear  computational  speed-up 
than  electronics. 
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34i/rafl-This  paper  describes  several  texture  segmentation  algo¬ 
rithms  based  on  deterministic  and  stochastic  relaxation  principles,  and 
their  implementation  on  parallel  networks.  The  segmentation  problem 
is  posed  as  an  optimization  problem  and  two  different  optimality  cri¬ 
teria  are  considered.  The  first  criterion  involves  maximizing  the  pos¬ 
terior  distribution  of  the  intensity  field  given  the  label  field  (maxi¬ 
mum  a  posteriori  (MAP)  estimate).  The  posterior  distribution  of  the 
texture  labels  is  derived  by  modeling  the  textures  as  Gauss  Markov 
random  "  id  (GMKF)  and  characterizing  the  distribution  of  different 
texture  t  :1s  by  a  discrete  multilevel  Markov  model.  Fast  approxi¬ 
mate  solutions  for  MAP  are  obtained  using  deterministic  relaxation 
techniques  implemented  on  a  Hopfield  neural  network  and  are  com¬ 
pared  with  those  of  simulated  annealing  in  obtaining  the  MAP  estimate. 
A  stochastic  algorithm  which  introduces  learning  into  the  iterations  of 
the  Hopfield  network  is  proposed.  This  iterated  hill-climbing  algorithm 
combines  fast  convergence  of  deterministic  relaxation  with  the  sus¬ 
tained  exploration  of  the  stochastic  algorithms,  but  is  guaranteed  to 
find  only  a  local  minimum.  The  second  optimality  criterion  requires 
minimizing  the  expected  percentage  of  misclassification  per  pixel  by 
maximizing  the  posterior  marginal  distribution,  and  the  maximum 
posterior  marginal  (MPM)  algorithm  is  used  to  obtain  the  correspond¬ 
ing  solution.  All  these  methods  implemented  on  parallel  networks  can 
be  easily  extended  for  hierarchical  segmentation  and  we  present  results 
of  the  various  schemes  in  classifying  some  real  textured  images. 


I.  Introduction 

THIS  PAPER  describes  several  algorithms,  both  de¬ 
terministic  and  stochastic,  for  the  segmentation  of 
textured  images.  Segmentation  of  image  data  is  an  im¬ 
portant  problem  in  computer  vision,  remote  sensing,  and 
image  analysis.  Most  objects  in  the  real  world  have  tex¬ 
tured  surfaces.  Segmentation  based  on  texture  informa¬ 
tion  is  possible  even  if  there  are  no  apparent  intensity 
edges  between  the  different  regions.  There  are  many  ex¬ 
isting  methods  for  texture  segmentation  and  classifica¬ 
tion,  based  on  different  types  of  statistics  that  can  be  ob¬ 
tained  from  the  gray  level  images.  The  approach  we  use 
stems  from  the  idea  of  using  Markov  random  field  models 
for  textures  in  an  image.  We  assign  two  random  vaiiables 
for  the  observed  pixel,  one  characterizing  the  underlying 
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intensity  and  the  other  for  labeling  the  texture  correspond¬ 
ing  to  the  pixel  location.  We  use  the  Gauss  Markov  ran¬ 
dom  field  (GMRF)  model  for  the  conditional  density  of 
the  intensity  field  given  the  label  field.  Prior  information 
about  the  texture  label  field  is  introduced  using  a  discrete 
Markov  distribution.  The  segmentation  can  then  be  for¬ 
mulated  as  an  optimization  problem  involving  minimiza¬ 
tion  of  a  Gibbs  energy  function.  Exhaustive  search  for  the 
optimum  solution  is  not  possible  because  of  the  large  di¬ 
mensionality  of  the  search  space.  For  example,  even  for 
the  very  simple  case  of  segmenting  a  128  x  128  image 
into  two  classes,  there  are  22"  possible  label  configura¬ 
tions.  Derin  and  ^lliott  [1]  have  investigated  the  use  of 
dynamic  programming  for  obtaining  an  approximation  to 
the  maximum  a  posteriori  (MAP)  estimate  while  Cohen 
and  Cooper  [2]  give  a  deterministic  relaxation  algorithm 
for  the  same  problem.  The  optimal  MAP  solution  can  be 
obtained  by  using  stochastic  relaxation  algorithms  such  as 
simulated  annealing  [3].  Recently  there  has  been  consid¬ 
erable  interest  in  using  neural  networks  for  solving  com¬ 
putationally  hard  problems  and  the  main  emphasis  in  this 
paper  is  on  developing  parallel  algorithms  which  can  be 
implemented  on  such  networks  of  simple  processing  ele¬ 
ments  (neurons). 

The  inherent  parallelism  of  neural  networks  provides  an 
interesting  architecture  for  implementing  many  computer 
vision  algorithms  [4],  Some  examples  are  image  restora¬ 
tion  (5),  stereopsis  (6),  and  computing  optical  flow  (7]- 
[9].  Networks  for  solving  combinatorially  hard  problems 
such  as  the  traveling  salesman  problem  have  received 
much  attention  in  the  neural  network  literature  (10].  In 
almost  all  cases,  these  networks  are  designed  to  minimize 
an  energy  function  defined  by  the  network  architecture. 
The  parameters  of  the  network  are  obtained  in  terms  of 
the  energy  (cost)  function  it  is  designed  to  minimize  and 
it  can  be  shown  (10]  that  for  networks  having  symmetric 
interconnections,  the  equilibrium  states  correspond  to  the 
local  minima  of  the  energy  function.  For  practical  pur¬ 
poses,  networks  with  few  interconnections  are  preferred 
because  of  the  large  number  of  processing  units  required 
in  any  image  processing  application.  In  this  context  Mar¬ 
kov  random  field  (MRF)  models  for  images  play  a  useful 
role.  They  arc  typically  characterized  by  local  dependen¬ 
cies  and  symmetric  interconnections  which  can  be  ex¬ 
pressed  in  terms  of  energy  functions  using  Gibbs-Markov 
equivalence  (3). 
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We  look  into  two  different  optimality  criteria  for  seg¬ 
menting  the  image.  The  first  corresponds  to  the  label  con¬ 
figuration  which  maximizes  the  posterior  probability  of 
the  label  array  given  the  intensity  array.  As  noted  before, 
an  exhaustive  search  for  the  optimal  solution  is  practically 
impossible.  An  alternative  is  to  use  stochastic  relaxation 
algorithms  such  as  simulated  annealing  |3|.  which  asymp¬ 
totically  converge  to  the  optimal  solution.  However  the 
computational  burden  involved  because  of  the  theoretical 
requirements  on  the  initial  temperature  and  the  impracti¬ 
cal  cooling  schedules  overweigh  their  advantages  in  many 
cases.  Fast  approximate  solutions  can  be  obtained  by  such 
deterministic  relaxation  algoritms  as  the  iterated  condi¬ 
tional  mode  rule  ( 1 1 ).  The  energy  function  corresponding 
to  this  optimality  criterion  can  be  mapped  into  a  Hopfield- 
type  network  in  a  straightforward  manner  and  it  can  be 
shown  that  the  network  converges  to  an  equilibrium  state, 
which  in  general  will  be  a  local  optimum.  The  solutions 
obtained  using  this  method  arc  sensitive  to  the  initial  con¬ 
figurations,  and  in  many  cases  starting  with  a  maximum 
likelihood  estimate  is  preferred.  Stochastic  learning  can 
be  easily  introduced  into  the  network,  and  the  overall  sys- 
'.tem  improves  the  performance  by  learning  while  search¬ 
ing.  The  learning  algorithms  used  are  derived  from  the 
theory  of  stochastic  learning  automata  f!2]  and  we  be¬ 
lieve  that  this  is  the  first  time  such  a  hybrid  system  has 
been  used  in  an  optimization  problem.  The  stochastic  na¬ 
ture  of  the  system  helps  in  preventing  the  algorithm  from 
being  trapped  in  a  local  minimum  and  we  observe  that  this 
improves  the  quality  of  the  solutions  obtained. 

The  second  optimality  criterion  minimizes  the  expected 
percentage  of  classification  error  per  pixel.  This  is  equiv¬ 
alent  to  finding  the  pixel  labels  that  maximize  the  mar¬ 
ginal  posterior  probability  given  the  intensity  data  (13|. 
Since  calculating  the  marginal  posterior  probability  is  very 
difficult,  Marroquin  [14]  suggested  the  MPM  algorithm, 
which  asymptotically  computes  the  posterior  marginal.  In 
[14]  the  algorithm  is  used  for  image  restoration,  stereo 
matching,  and  surface  interpolation.  Here  we  use  this 
method  to  find  the  texture  label  that  maximizes  the  mar¬ 
ginal  posterior  probability  for  each  pixel. 

The  organization  of  the  paper  is  as  follows:  Section  II 
describes  the  image  model.  A  neural  network  model  for 
the  relaxation  algorithms  is  given  in  Section  III  along  with 
a  deterministic  updating  rule.  Section  IV  discusses  the 
stochastic  algorithms  for  segmentation  and  their  parallel 
implementation  on  network.  A  learning  algorithm  is 
proposed  in  Section  V  and  the  experimental  results  are 
provided  in  Section  VI, 

II.  Jmagi-:  Moufii. 

The  use  of  MRF  models  for  image  processing  applica¬ 
tions  has  been  investigated  by  many  researchers  < mv  ,  .»  . 
(Jncllappa  |I5|).  Cross  and  Jain  [16]  provide  a  detailed 
discussion  on  the  application  of  MRF  in  modeling  tex¬ 
tured  images.  Genian  and  Genian  [3]  discuss  the  equiva¬ 
lence  between  MRF’  and  Gibbs  distributions.  The  GMRF 
model  for  the  texture  intensity  process  has  been  used  in 


7 

6 

7 

S 

4 

3 

4 

5 

r 

4 

2 

1 

2 

4 

7 

6 

3' 

1 

X 

1 

3 

6 

7 

4 

2 

t 

2 

4 

7 

S 

4 

3 

4 

c 

7 

6 

7 

Fig.  I  Si  rue  Jure  of  ilic  GMKf:  model.  The  numbers  indicate  the  order  of 
the  model  relative  to  v  |  I6J. 


[  1 1 ,  1 2 ] ,  and  1 1 7 1 .  The  MRF  is  also  used  to  describe  the 
label  process  in  [1]  and  [2],  In  this  paper  we  use  the 
fourth-order  GMRF  indicated  in  Fig.  1  to  model  the  con¬ 
ditional  probability  density  of  the  image  intensity  array 
given  its  texture  labels.  The  texture  labels  are  assumed  to 
obey  a  first-  or  second-order  discrete  Markov  model  with 
a  single  parameter  /3,  which  measures  the  amount  of  clus¬ 
tering  between  adjacent  pixels. 

Let  ft  denote  the  set  of  grid  points  in  the  M  X  M  lattice, 
i.e.,  ft  =  {(/,_/),  1  <  i,j <  M  }.  Following  Geman  and 
Graffigne  [  18],  we  construct  a  composite  model  which  ac¬ 
counts  for  tenure  labels  and  gray  levels.  Let  { Ls,  jell) 
and  s  e  ft)  denote  the  labels  and  zero  mean  gray 
level  arrays  respectively.  The  zero  mean  array  is  obtained 
by  subtracting  the  local  mean  computed  in  a  small  win¬ 
dow  centered  at  each  pixel.  Let  N„  denote  the  symmetric 
fourth-order  neighborhood  of  a  site  s.  Then,  assuming  that 
all  the  neighbors  of  s  also  have  the  same  label  as  that  of 
s,  we  can  write  the  following  expression  for  the  condi¬ 
tional  density  of  the  intensity  at  the  pixel  site  s: 

P(Ys  =  ys\Yr  =  yr,reN„L,  =  l) 

-  exP  [  ~u(  =  y*  I  Yr  =  JV.  r  e  Ns,  Ls  =  /)] 

Z(l\yr,reNs) 

(I) 

where  Z(/  |  yr,  r  e  A/,)  is  the  partition  function  of  the  con¬ 
ditional  Gibbs  distribution  and 


U(Y„  =  v,  |  Yr  =  y„  reN„  L,  =  /) 


2  E  0 

reN< 


(2) 


In  (2),  a;  and  0;  arc  the  GMRF  model  parameters  of 
the  / th  texture  class.  The  model  parameters  satisfy  0'r ,  = 

0'-.  =  ©I-,  =  O'- 

We  view  the  image  intensity  array  as  composed  of  a  set 
of  overlapping  k  x  k  window.-,  IT,,  centered  at  each  pixel 
.t  e  ft.  In  each  of  these  windows  wc  assume  that  the  tex¬ 
ture  label  /.,  is  homogeneous  (all  the  pixels  in  the  window 
Sr*(r>noing  to  thr*  same  textuie)  and  compute  the  joint  dis¬ 
tribution  of  the  intensity  in  the  window  conditioned  on  L,. 
The  corresponding  Gibbs  energy  is  used  in  the  relaxation 
process  for  segmentation.  As  explained  in  the  previous 
paragraph,  the  image  intensity  in  the  window  is  modeled 
by  a  fourth-order  stationary  GMRF.  The  local  mean  is 
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computed  by  taking  the  average  of  the  intensities  in  the 
window  Ws  and  is  subtracted  from  the  original  image  to 
get  the  zero  mean  image.  All  our  references  to  the  inten¬ 
sity  array  correspond  to  the  zero  mean  image.  Let  Y*  de¬ 
note  the  2-D  vector  representing  the  intensity  array  in  the 
window  H/>.  Using  the  Gibbs  formulation  and  assuming  a 
free  boundary  model,  the  joint  probability  density  in  the 
window  Ws  can  be  written  as  (2] 


p(y:\ls  =  i)  = 


expl-U.Ur  |4  =  /)] 


where  Z,(/ )  is  the  partition  function  and 


t/,( Y*  |  4  =  /)  =  — j  £  ]y2r  ~  £ 

2 Of  reWx  ^  |  r  +  re 

■  Q'ryr(yr+T  +  (3) 

N*  is  the  set  of  shift  vectors  corresponding  to  a  fourth- 
order  neighborhood  system: 

N*  =  {r,,  t2,  Tj.  •  *  •  ,  tio} 

’  =  {(0,  I),  (1,0),  (1.  I),  (-1.  1),  (0,2).  (2,0), 

(1,2),  (2.  I ),(-!,  2),(-2,  I)}. 

The  label  field  is  modeled  as  a  first-  or  second-order 
discrete  MRF.  If  N,  denotes  the  appropriate  neighborhood 
for  the  label  field,  then  we  can  write  the  distribution  func¬ 
tion  for  the  texture  label  at  site  s  conditioned  on  the  labels 
of  the  neighboring  sites  as 


P(4  |  Lr,  re  Ns)  = 


,  -UHL,  |  Lr) 


where  Z2  is  a  normalizing  constant  and 

4(4  |  Ln  r  e  #,)  =  -0  £  5(4  -  Lr).  P  >  0. 


In  (4),  P  determines  the  degree  of  clustering,  and  5(t  - 
j )  is  the  Kronecker  delta.  Using  the  Bayes  rule,  we  can 
write 

P(L,  |  Y?,  Lr.  reN>) 

P(  4*  1  4)  P(L,  |  L,,  r  e  N,) 

P(Y .*) 

Since  Y*  is  known,  the  denominator  in  (5)  is  just  a  con¬ 
stant.  The  numerator  is  a  product  of  two  exponential  func¬ 
tions  and  can  be  exposed  as 

P<L,  I  Y*.  re  A),) 


=  -  exp  [-(/,,(  4  |  4*.  4.  re  /V, )  ]  (6) 

where  Zp  is  the  partition  function  and  (Jr(  •  )  is  the  pos¬ 
terior  energy  corresponding  to  (5).  From  (3)  and  (4)  we 


write 


4,(4  |  4*.  4.  refi.) 

=  ».’( 4)  +  4(4*  |  4)  +  (4(4  I  4.  re/?.).  (7) 


Note  that  the  second  term  in  (7)  relates  the  observed  pixel 
intensities  to  the  texture  labels  and  the  last  term  specifics 
the  label  distribution.  The  bias  term  >r(4 )  =  log  4(4) 
is  dependent  on  the  texture  class  and  it  can  be  explicitly 
evaluated  for  the  GMRF  model  considered  here  using  the 
toroidal  assumption  (the  computations  become  very  cum¬ 
bersome  if  toroidal  assumptions  arc  not  made).  An  alter¬ 
native  approach  is  to  estimate  the  bias  from  the  histogram 
of  the  data  as  suggested  by  Gcman  and  Gralfigne  [18]. 
Finally,  the  posterior  distribution  of  the  texture  labels  for 
the  entire  image  given  the  intensity  array  is 


P(L  |  Y*)  = 


P(Y*\L)P(L) 

P(  Y*) 


(8) 


Maximizing  (8)  gives  the  optimal  Bayesian  estimate. 
Though  it  is  possible  in  principle  to  compute  the  right- 
hand  side  of  (8)  and  find  the  global  optimum,  the  com¬ 
putational  burden^involved  is  so  enormous  that  it  is  prac¬ 
tically  impossible  to  do  so.  However,  we  note  that  the 
stochastic  relaxation  algorithms  discussed  in  Section  IV 
require  only  the  computation  of  (6)  to  obtain  the  optimal 
solution.  The  deterministic  relaxation  algorithm  given  in 
the  next  section  also  uses  these  values,  but  in  this  case 
the  solution  is  only  an  approximation  to  the  MAP  esti¬ 
mate. 


III.  A  Neural  Network  for  Texture 
Classification 

We  describe  the  network  architecture  used  for  segmen¬ 
tation  and  the  implementation  of  deterministic  relaxation 
algorithms.  The  energy  function  which  the  network  min¬ 
imizes  is  obtained  from  the  image  model  discussed  in  the 
previous  section.  For  convenience  of  notation  let 
/)  =  (/,(  4*,  4  =  0  +  w(/ ),  where  s  =  (/,  j  )  denotes 
a  pixel  site  and  Vx(  • )  and  w(  I )  are  as  defined  in  (7).  The 
network  consists  of  K  layers,  each  layer  arranged  as  an  M 
x  M  array,  where  K  is  the  number  of  texture  classes  in 
the  image  and  M  is  the  dimension  of  the  image.  The  ele¬ 
ments  (neurons)  in  the  network  arc  assumed  to  be  binary 
and  are  indexed  by  (/,  j,  I)  where  (i,  j)  =  s  refers  to 
t heir  position  in  the  image  and  /  refers  to  the  layer.  The 
( i,  j,  l  )th  neuron  is  said  to  be  on  if  its  output  Vljt  is  I. 
indicating  that  the  corresponding  site  s  =  (/.  j)  in  the 
image  has  the  texture  label  /.  Let  7],,  ,  (7-  be  the  connection 
strength  between  neurons  (/,  /,  / )  and  (/",  /,  /’  )  and  /„, 
be  the  input  bias  current.  Then  a  general  form  for  the  en¬ 
ergy  of  the  network  is  [  10] 

M  M  K  M  M  K 

7.  -*-(££  £  £  £  £  /„/  p„i  f ,  i 

I  ;  I  I  I  i  IT  IT  I 

M  M  K 

-  -i  £  £  £  4K„-  (9) 

-  i  ,  \  i  i 
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From  our  discussion  in  Section  U,  we  note  that  a  so¬ 
lution  for  the  MAP  estimate  can  be  obtained  by  minimiz¬ 
ing  (8).  Here  we  approximate  the  posterior  energy  by 

U(L\Y*)  =  £  \L,)  +  +  U2(L>)} 


(10) 


and  the  corresponding  Gibbs  energy  to  be  minimized  can 
be  written  as 


»  M  X 


£  =  -  X  X  Z  (/,(/.;.  /)  VH, 

L  t  =  I  j  -  I  /  -  | 


„  K  M  M 

^  Z  S  £  £  ,  KjvKji  (n) 

2  /  =  l  i=i  j  =  i  u'.j  teN,, 


where  N ,,  is  the  neighborhood  of  site  ( i,  j  )  ( same  as  the 
yV,  in  Section  II).  In  (II),  it  is  implicitly  assumed  that 
each  pixel  site  has  a  unique  label;  i.e.,  only  one  neuron 
is  active  in  each  column  of  the  network.  This  constraint 
can  be  implemented  in  different  ways.  For  the  determin¬ 
istic  relaxation  algorithm  described  below,  a  simple 
method  is  to  use  a  winner-takes-all  circuit  for  each  col¬ 
umn  so  that  the  neuron  receiving  the  maximum  input  is 
turned  on  and  the  others  are  turned  off.  Alternatively,  a 
penalty  term  can  be  introduced  in  (11)  to  represent  the 
constraint  as  in  [10].  From  (9)  and  (II)  we  can  identify 
the  parameters  for  the  network: 


WjT 


and  the  bias  current 


h 


if  (/',/)  el  W  =  /' 

‘  (12) 

otherwise 


(13) 


A.  Deterministic  Relaxation 

The  above  equations  (12)  and  (13)  relate  the  parameters 
of  the  network  to  that  of  the  image  model.  The  connection 
matrix  for  the  above  network  is  symmetric  and  there  is  no 
sc  If- feed  back;  i.e.,  Tjjt ,jt  =  0,  V/, /,  /.  Let  be  the  po¬ 
tential  of  neuron  (/,  j,  l ).  With  /  the  layer  number  cor¬ 
responding  to  texture  class  /,  we  have 


M  M  K 

■ w*v 


+  I, 


'/'■ 


(14) 


V'i>  = 


(15) 


In  order  to  minimize  (11),  we  use  the  following  updating 
rule: 

I  if =  min,  {«„,  } 

.  0  otherwise. 

This  updating  scheme  ensures  that  at  each  stage  the  cn- 
crgy  decreases.  Since  thc  energy  is  bounded,  the  convcr- 

'  "  t'lc  a^ovc  system  is  ensured  but  the  stable  state 

i  m  general  be  a  local  optimum. 

"s  nc,vvo'k  model  is  a  version  of  the  iterated  condi - 
",<Hic  (ICM)  algorithm  of  Besag  [ll|.  This  algo- 
1 1  j."  7— es.the  conditional  probability  £( /.,  = 
(l(.(  e  M )  during  each  iteration.  It  is  a  local 

nn.nstic  relaxation  algorithm  that  is  very  easy  to  im¬ 


plement.  We  observe  that  in  general  an  algorithm  based 
on  MRF  models  can  be  easily  mapped  onto  neural  net¬ 
works  with  local  interconnections.  The  main  advantage  of 
this  deterministic  relaxation  algorithm  is  its  simplicity. 
Often  the  solutions  arc  reasonably  good  and  the  algorithm 
usually  converges  within  20-30  iterations.  In  the  next 
section  we  study  two  stochastic  schemes  which  asymp¬ 
totically  converge  to  the  global  optimum  of  the  respective 
criterion  functions. 

IV.  Stochastic  Algorithms  for  Tkxture 
Segmentation 

We  look  at  two  optimal  solutions  corresponding  to  dif¬ 
ferent  decision  rules  for  determining  the  labels.  The  first 
one  uses  simulated  annealing  to  obtain  the  optimum  MAP 
estimate  of  the  label  configuration.  The  second  algorithm 
minimizes  the  expected  miselassification  per  pixel.  The 
parallel  network  implementation  of  these  algorithms  is 
discussed  in  Section  IV-C. 


A.  Searching  for  MAP  Solution 

The  MAP  rul^  ( 1 8]  searches  for  the  configuration  L  that 
maximizes  the  posterior  probability  distribution.  This  is 
equivalent  to  maximizing  P(Y*  \  L)  P{L)  as  P(Y*)  is 
independent  of  the  labels  and  Y*  is  known.  The  right- 
hand  side  of  (8)  is  a  Gibbs  distribution.  To  maximize  (8) 
we  use  simulated  annealing  [3],  a  combinatorial  optimi¬ 
zation  method  which  is  based  on  sampling  from  varying 
Gibbs  distribution  functions: 


exp 


~~  |  Y* ,  Lr,  r  e  Ns) 

lt 


in  order  to  maximize 


Tt  being  the  time-varying  parameter,  referred  to  as  the 
temperature.  We  used  the  following  cooling  schedule: 


T \  = 


I  +  log2  k 


(16) 


where  k  is  the  iteration  number.  When  the  temperature  is 
high,  the  bond  between  adjacent  pixels  is  loose,  and  the 
distribution  tends  to  behave  like  a  uniform  distribution 
over  the  possible  texture  labels.  As  7)  decreases,  the  dis¬ 
tribution  concentrates  on  the  lower  values  of  the  energy 
function,  which  correspond  to  points  with  higher  proba¬ 
bility.  The  process  is  bound  to  converge  to  a  uniform  dis¬ 
tribution  over  the  label  configuration  that  corresponds  to 
the  MAP  solution.  Since  the  numcr  of  texture  labels  is 
finite,  convergence  of  this  algorithm  follows  from  |3|.  In 
our  experiment,  we  realized  that  stalling  the  iterations 
with  7j,  =  2  did  not  guarantee  convergence  to  the  MAP 
solution.  Since  starling  at  a  much  higher  temperature  will 
slow  the  convergence  of  the  algorithm  significantly,  we 
use  an  alternative  approach,  viz.,  cycling  the  temperature 
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[13J.  We  follow  the  annealing  schedule  until  Tk  reaches  a 
lower  bound;  then  we  reheat  the  system  and  start  a  new 
cooling  process.  By  using  only  a  few  cycles,  we  obtained 
results  better  than  those  with  a  single  cooling  cycle.  Par¬ 
allel  implementation  of  simulated  annealing  on  the  net¬ 
work  is  discussed  in  Section  IV-C.  The  results  we  present 
in  Section  VI  were  obtained  with  two  cycles. 

B  Maximizing  the  Posterior  Marginal  Distribution 

The  choice  of  the  objective  function  for  optimal  seg¬ 
mentation  can  significantly  affect  its  result.  The  choice 
should  be  made  depending  on  the  purpose  of  the  classifi¬ 
cation.  In  many  implementations  the  most  reasonable  ob¬ 
jective  function  is  the  one  that  minimizes  the  expected 
percentage  misclassification  per  pixel.  The  solution  to  the 
above  objective  function  is  also  the  one  that  maximizes 
the  marginal  posterior  distribution  of  Ls  given  the  obser¬ 
vation  Y*  for  each  pixel  s: 

P{LS  -  I,  |  Y*  =  y*} 

oc  Z  P(Y*  =y*\L  =  l)P(L  =  /). 

/ 1  L\  m  l\ 

The  summation  above  extends  over  all  possible  label 
configurations  keeping  the  label  at  site  5  constant.  This 
concept  was  thoroughly  investigated  in  [14].  Marroquin 
[19]  discusses  this  formulation  in  the  context  of  image 
restoration  and  illustrates  the  performance  on  images  with 
few  gray  levels.  He  also  mentions  the  possibility  of  using 
this  objective  function  for  texture  segmentation.  In  [II ] 
the  same  objective  function  is  mentioned  in  the  context  of 
image  estimation. 

To  find  the  optimal  solution  we  use  the  stochastic  al¬ 
gorithm  suggested  in  [14],  The  algorithm  samples  out  of 
the  posterior  distribution  of  the  texture  labels  given  the 
intensity.  Unlike  the  stochastic  relaxation  algorithm, 
samples  are  taken  with  a  fixed  temperature  7=1.  The 
Markov  chain  associated  with  the  sampling  algorithm 
converges  with  probability  I  to  the  posterior  distribution. 
We  define  new  random  variables  g/  for  each  pixel  (s  e 

fl): 


fO  otherwise 

where  L[  is  the  class  of  the  s  pixel,  at  time  t,  in  the  state 
vector  of  the  Markov  chain  associated  with  the  Gibbs 
sampler.  We  use  the  ergodic  property  of  the  Markov  chain 
[20]  to  calculate  the  expectations  for  these  random  vari¬ 
ables  using  time  averaging: 

E{g[  '}  =  lim  -|V-'  =  P,  { 7,  =  / 1  K* } 

N-oo  /V 

where  N  is  the  number  of  iterations  performed.  To  obtain 
the  optimal  class  for  each  pixel,  we  simply  chose  the  class 
that  occurred  more  often  than  the  others. 

The  MPM  algorithm  was  implemented  using  the  Gibbs 
sampler  [3].  A  much  wider  set  of  sampling  algorithms, 
such  as  Metropolis,  can  be  used  for  this  purpose.  The  al¬ 


gorithms  can  be  implemented  sequentially  or  in  parallel, 
with  a  deterministic  or  stochastic  decision  rule  for  the  or¬ 
der  of  visiting  the  pixels.  In  order  to  avoid  the  dependence 
on  the  initial  state  of  the  Markov  chain,  we  can  ignore  the 
first  few  iterations.  In  the  experiments  conducted  we  ob¬ 
tained  good  results  after  500  iterations.  The  algorithm 
does  not  suffer  from  the  drawbacks  of  simulated  anneal¬ 
ing.  For  instance  we  do  not  have  to  start  the  iterations 
with  a  high  temperature  to  avoid  local  minima,  and  the 
performance  is  not  badly  affected  by  enlarging  the  state 
space. 


C.  Network  Implementation  of  the  Sampling  Algorithms 

All  the  stochastic  algorithms  described  in  the  Gibbs  for¬ 
mulation  are  based  on  sampling  from  a  probability  distri¬ 
bution.  The  probability  distribution  is  constant  in  the 
MPM  algorithm  [14]  and  is  time  varying  in  the  case  of 
annealing.  The  need  for  parallel  implementation  is  due  to 
the  heavy  computational  load  associated  with  their  use. 

The  issue  of  parallel  implementation  in  stochastic  al¬ 
gorithms  was  first  addressed  by  Geman  and  Geman  [3]. 
They  show  that  the  Gibbs  sampler  can  be  implemented  by 
any  deterministic^  stochastic  rule  for  choosing  the  order 
in  which  pixels  are  updated,  as  long  as  each  pixel  is  vis¬ 
ited  infinitely  often.  An  iteration  is  the  time  required  to 
visit  each  pixel  at  least  once  (a  full  sweep).  Note  that  the 
stochastic  rules  have  a  random  period  and  allow  us  to  visit 
a  pixel  more  than  once  in  a  period.  They  consider  the  new 
Markov  chain  one  obtains  from  the  original  by  viewing  it 
only  after  each  iteration.  Their  proof  is  based  on  two  es¬ 
sential  elements.  The  first  is  the  fact  that  the  embedded 
Markov  chain  has  a  strictly  positive  transition  probability 
Pi/  for  any  possible  states  i,j,  which  proves  that  the  chain 
will  converge  to  a  unique  probability  measure  regardless 
of  the  initial  state.  The  second  is  that  the  Gibbs  measure 
is  an  invariant  measure  for  the  Gibbs  sampler,  so  that  the 
embedded  chain  converges  to  the  Gibbs  measure.  The 
proof  introduced  in  [3]  can  be  applied  to  a  much  larger 
family  of  sampling  algorithms  satisfying  the  following 
properties  [20]; 

1)  The  sampler  produces  a  Markov  chain  with  a  pos- 
tive  transition  probability  p„  for  any  choice  of  states 

'  -  j- 

2)  The  Gibbs  measure  is  invariant  under  the  sampling 
algorithm. 

The  Metropolis  and  heat  bath  algorithms  are  two  such 
sampling  methods.  To  see  that  the  Metropolis  algorithm 
satisfies  property  2,  we  look  at  the  following  equation  for 
updating  a  single  pixel: 

P"i|(/)=-  Z  P"(j) 

m  1(7 1 <  id  i 


+  -  Z  P"(i) 

m  wi  / 1  <  id  i 


”•(')  ~  *(j) 
* r(i) 


I  v 

+  -  2  P  ( j ) 


*•(') 

*(/) 
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where  m  is  the  number  of  values  each  pixel  can  take.  The 
first  term  corresponds  to  the  cases  when  the  system  was 
in  state  j  and  the  new  state  i  has  higher  probability.  The 
second  term  corresponds  to  a  system  in  state  i  and  a  new 
state  j  that  has  lower  probability.  The  given  probability  is 
for  staying  in  state  r.  The  third  term  corresponds  to  a  sys¬ 
tem  in  state  j  and  a  new  state  i  with  lower  probability.  If 
we  now  replace  P"  + '(/ )  and  P"(i )  with  r<i )  and  P"(j  ). 
we  see  that  the  equality  holds,  implying  that  the  Gibbs 
measure  is  invariant  under  the  Metropolis  algorithm.  The 
first  property  is  also  satisfied.  Note  that  the  states  now 
correspond  to  the  global  configuration.  To  implement  the 
algorithm  in  parallel,  one  can  update  pixels  in  parallel  as 
long  as  neighboring  pixels  are  not  updated  at  the  same 
time.  A  very  clear  discussion  on  this  issue  can  be  found 
in  114]. 

We  now  describe  how  these  stochastic  algorithms  can 
be  implemented  on  the  network  discussed  in  Section  III. 
The  only  modification  required  for  the  simulated  anneal¬ 
ing  rule  is  that  the  neurons  in  the  network  fire  according 
to  a  time-dependent  probabilistic  rule.  Using  the  same  no¬ 
tation  as  in  Section  III,  the  probability  that  neuron  (i,j, 
l)  will  fire  during  iteration  k  is 

-I  I /7i )utft 

P(Kj,=  1)  =  — = -  (17) 

zn 

where  ut]i  is  as  defined  in  (14)  and  Tk  follows  the  cooling 
schedule  (16). 

The  MPM  algorithm  uses  the  above  selection  rule  with 
Tk  =  1.  In  addition,  each  neuron  in  the  network  has  a 
counter  which  is  incremented  every  time  the  neuron  fires. 
When  the  iterations  are  terminated  the  neuron  in  each  col¬ 
umn  of  the  network  having  the  maximum  count  is  selected 
to  represent  the  label  for  the  corresponding  pixel  site  in 
the  image. 

We  have  noted  before  that  for  parallel  implementation 
of  the  sampling  algorithms,  neighboring  sites  should  not 
be  updated  simultaneously.  Some  additional  observations 
are  made  in  Section  VI. 

V.  Stochastic  Learning  and  Neural  Networks 

In  the  previous  sections  purely  deterministic  and  sto¬ 
chastic  relaxation  algorithms  were  discussed.  Each  has  its 
own  advantages  and  disadvantages.  Here  we  consider  the 
possibility  of  combining  the  two  methods  using  stochastic 
learning  automata  and  we  compare  the  results  obtained  by 
this  new  scheme  with  those  of  previous  algorithms. 

We  begin  with  a  brief  introduction  to  the  stochastic 
learning  automaton  [12].  An  automation  is  a  decision 
maker  operating  in  a  random  environment.  A  stochastic 
automation  can  be  defined  by  a  quadruple  (a,  Q ,  T,  R). 
where  a  =  { a,.  •  •  •  ,  aN }  is  the  set  of  available  actions 
to  the  automaton.  The  action  selected  at  time  ( is  denoted 
by  a  (t).  Q(t)  is  the  state  of  the  automaton  at  time  t  and 
consists  of  the  action  probability  vector  p(t)  =  [/MO. 

‘  •  ,  Pn  ( t )  1 .  where  p,  ( t )  =  prob  (a (l)  =  a,)  and  E, 
p,(t)  =  I  'it.  The  environment  responds  to  the  action 


a(t)  with  a  X(r)  e  R,  R  being  the  set  of  the  environment’s 
responses.  The  state  transitions  of  the  automaton  are  gov¬ 
erned  by  the  learning  algorithm  T,  Q(t  +  I )  -  T(Q(t), 
a(0.  MO)-  Without  loss  of  generality,  it  can  be  as¬ 
sumed  that  R  =  (0,  1  ];  i.e.,  the  responses  are  normalized 
to  lie  in  the  interval  [0,  1  ],  1  indicating  a  complete  suc¬ 
cess  and  0  total  failure.  The  goal  of  the  automaton  is  to 
converge  to  the  optimal  action,  i.e.,  the  action  which  re¬ 
sults  in  the  maximum  expected  reward.  Again  without  loss 
of  generality  let  a,  be  the  optimal  action  and  d ,  = 
£[\(0|a(l  =  max,  { £[  X(f)  |  Of] }.  At  present  no 
learning  algorithms  exist  which  arc  optimal  in  the  above 
sense.  However  we  can  choose  the  parameters  of  certain 
learning  algorithms  so  as  to  realize  a  response  as  close  to 
the  optimum  as  desired.  This  condition  is  called  e  opti¬ 
mality.  If  M(t)  =  £[  X(f)  |  /?(/)],  then  a  learning  algo¬ 
rithm  is  said  to  be  e  optimal  if  it  results  in  an  M(t)  such 
that 

lim  £[A/(r)]  >  d,  -  e  (18) 

l  —  oo 

for  a  suitable  choice  of  parameters  and  for  any  e  >  0. 
One  of  the  sinfplest  learning  schemes  is  the  linear  reward- 
inaction  rule,  Suppose  at  time  t  we  have  a(/)  = 

a,;  if  \(r)  is  the  response  received,  then  according  to  the 
LR-t  rule 

£,('+!)=  Pi  it)  +  flX(r)  [I  -  />,(/)] 

PjU  -l-  1)  *  PjU)  [1  “  oX(/)],  Vy  *  i  (19) 

where  a  is  a  parameter  of  the  algorithm  controlling  the 
learning  rate.  Typical  values  fora  are  in  the  range  0.01- 
0.1.  It  can  be  shown  that  this  LR  rule  is  e  optimal  in  all 
stationary  environments;  i.e.,  there  exists  a  value  for  the 
parameter  a  so  that  condition  (18)  is  satisfied. 

Collective  behavior  of  a  group  of  automata  has  also 
been  studied.  Consider  a  team  of  N  automata  A,(i  =  1, 

•  •  •  ,  N),  each  having  r,  actions  a'  =  {a1,  ■  •  •  a'r,}. 
At  any  instant  t  each  member  of  the  team  makes  a  deci¬ 
sion  a'(t).  The  environment  responds  to  this  by  sending 
a  reinforcement  signal  X ( r )  to  all  the  automata  in  the 
group.  This  situation  represents  a  cooperative  game 
among  a  team  of  automata  with  an  identical  payoff.  All 
the  automata  update  their  action  probability  vectors  ac¬ 
cording  to  (19)  using  the  same  learning  rate,  and  the  pro¬ 
cess  repeats.  Local  convergence  results  can  be  obtained 
for  the  case  of  stationary  random  environments.  Varia¬ 
tions  of  this  rule  have  been  applied  to  complex  problems 
such  as  decentralized  control  of  Markov  chains  [21]  and 
relaxation  labeling  [22]. 

The  texture  classification  discussed  in  the  previous  sec¬ 
tions  can  be  treated  as  a  relaxation  labeling  problem  and 
stochastic  automata  can  be  used  to  learn  the  labels  (tex¬ 
ture  class)  for  the  pixels.  A  learning  automaton  is  as¬ 
signed  to  each  of  the  pixel  sites  in  the  image.  The  actions 
of  the  automata  correspond  to  selecting  a  label  for  the 
pixel  site  to  which  it  is  assigned.  Thus  fora  AC-class  prob¬ 
lem  each  automaton  has  K  actions  and  a  probability  dis- 
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tribulion  over  this  action  set.  Initially  the  labels  are  as¬ 
signed  randomly  with  equal  probability.  Since  the  number 
of  automata  involved  is  very  large,  it  is  not  practical  to 
update  the  action  probability  vector  at  each  iteration.  In¬ 
stead  we  combine  the  iterations  of  the  neural  network  de¬ 
scribed  in  the  previous  section  with  the  stochastic  learning 
algorithm.  This  results  in  an  iterative  hill-climbing-lypc 
algorithm  which  combines  the  fast  convergence  of  deter¬ 
ministic  relaxation  with  the  sustained  exploration  of  the 
stochastic  algorithm.  The  stochastic  part  prevents  the  al¬ 
gorithm  from  getting  trapped  in  local  minima  and  at  the 
same  lime  "learns”  from  the  search  by  updating  the  state 
probabilities.  However,  in  contrast  to  simulated  anneal¬ 
ing,  we  cannot  guarantee  convergence  to  the  global  opti¬ 
mum.  Each  cycle  now  has  two  phases:  the  first  consists 
of  the  deterministic  relaxation  network  converging  to  a 
solution;  the  second  consists  of  the  learning  network  up¬ 
dating  its  state,  the  new  state  being  determined  by  the 
equilibrium  state  of  the  relaxation  network.  A  new  initial 
state  is  generated  by  the  learning  network  depending  on 
its  current  state  and  the  cycle  repeats.  Thus  relaxation  and 
learning  alternate  with  each  other.  After  each  iteration  the 
probability  of  the  more  stable  states  increases  and  because 
of  the  stochastic  nature  of  the  algorithm  the  possibility  of 
getting  trapped  in  bad  local  minima  is  reduced.  The  al¬ 
gorithm  is  summarized  below. 

A.  Learning  Algorithm 

Let  the  pixel  site  be  denoted  by  s  e  fl  and  the  number 
of  texture  classes  be  K.  Let  A,  be  the  automaton  assigned 
to  site  5  and  the  action  probability  vector  of  A,  be  ps(t) 
=  [p,.i(0,  •  *  *  ,Ps.kO )]  and  =  1  Vs,  t,  where 

psJ(t)  =  prob  (label  of  site  s  =  /).  The  steps  in  the 
algorithm  are  as  follows: 

1)  Initialize  the  action  probability  vectors  of  all  the  au¬ 
tomata 

»,./( 0)=\/K,  Vs,  l. 

Initialize  the  iteration  counter  to  0. 

2)  Choose  an  initial  label  configuration  sampled  from 
the  distribution  of  these  probability  vectors. 

3)  Start  the  neural  network  of  Section  III  with  this  con¬ 
figuration. 

4)  Let  /,  denote  the  label  for  site  5  at  equilibrium.  Let 
the  current  time  (iteration  number)  be  t.  Then  the 
action  probabilities  are  updated  as  follows. 

Pl.iAl  +  1)  =  Ps.i.O)  +  aM0[l  ~  />../,(')] 
Ps.,{<  +  i )  =  Ps.,(t)  [  i  -  «M0]. 

Vj  /,  and  V.v.  (20) 

The  response  X ( r )  is  derived  as  follow:  Suppose  the 
present  label  configuration  resulted  in  a  lower  en¬ 
ergy  state  than  the  previous  one.  Then  it  results  in 
X ( / )  =  X|,  and  if  the  energy  increases  wc  have  X(r) 
=  X2  with  X,  >  X2.  In  our  simulations  we  used  X, 
=  I  and  X2  =  0.25. 


I  (US 

5)  Generate  a  new  configuration  from  this  updated  la¬ 
bel  probabilities,  increment  the  iteration  counter, 
and  go  to  step  3. 

Thus  the  system  consists  of  two  layers,  one  for  relax¬ 
ation  and  the  other  for  learning.  The  relaxation  network 
is  similar  to  the  one  considered  in  Section  III,  the  only 
difference  being  that  the  initial  state  is  decided  by  the 
learning  network.  The  learning  network  consists  of  a  team 
of  automata  and  learning  takes  place  at  a  much  lower 
speed  than  the  relaxation,  with  fewer  updatings.  The 
probabilities  of  the  labels  corresponding  to  the  final  state 
of  the  relaxation  network  are  increased  according  to  (20). 
Using  these  new  probabilities  a  new  configuation  is  gen¬ 
erated.  Since  the  response  does  not  depend  on  time,  this 
corresponds  to  a  stationary  environment,  and  as  we  have 
noted  before  this  LR_t  algorithm  can  be  shown  to  con¬ 
verge  to  a  stationary  point,  not  necessarily  to  the  global 
optimum. 

VI.  Experimental  Results  and  Conclusions 

The  segmentation  results  using  the  above  algorithms  are 
given  on  two  exan^ples.  The  parameters  o,  and  0,  corre¬ 
sponding  to  the  fourth-order  GMRF  for  each  texture  class 
were  precomputed  from  64  x  64  images  of  the  textures. 
The  local  mean  (in  an  11  x  11  window)  was  first  sub¬ 
tracted  to  obtain  the  zero  mean  texture,  and  the  least- 
square  estimates  [17]  of  the  parameters  were  then  com¬ 
puted  from  the  interior  of  the  image.  The  first  step  in  the 
segmentation  process  involves  computing  the  Gibbs  ener¬ 
gies  £/|(  Y*  |  L,)  in  (3).  This  is  done  for  each  texture  class 
and  the  results  are  stored.  For  computational  convenience 
these  (/,(•)  values  are  normalized  by  dividing  by  k2, 
where  k  is  the  size  of  the  window.  To  ignore  the  boundary 
effects,  we  set  (/,  =  0  at  the  boundaries.  We  have  exper¬ 
imented  with  different  window  sizes;  larger  windows  re¬ 
sult  in  more  homogeneous  texture  patches  but  the  bound¬ 
aries  between  the  textures  are  distorted.  The  results 
reported  here  are  based  on  windows  of  size  llxll  pix¬ 
els.  The  bias  term  w(/v)  can  be  estimated  using  the  his¬ 
togram  of  the  image  data  ( 18)  but  we  obtained  these  val¬ 
ues  by  trial  and  error. 

In  Section  IV  wc  observed  that  neighboring  pixel  sites 
should  not  be  updated  simultaneously.  This  problem  oc¬ 
curs  only  if  digital  implementations  of  the  networks  are 
considered,  as  the  probability  of  this  happening  in  an  an¬ 
alog  network  is  zero.  When  this  simultaneous  updating 
was  tested  for  the  deterministic  case,  it  always  converged 
to  limit  cycles  of  length  2.  (In  fact  it  can  be  shown  that 
the  system  converges  to  limit  cycles  of  length  at  most  2.) 

The  choice  of  0  plays  an  important  role  in  the  segmen¬ 
tation  process  and  its  value  depends  on  the  magnitude  of 
the  energy  function  U,(  ■  ).  Various  values  of  0  ranging 
from  0. 2-3.0  were  used  in  the  experiments  In  the  deter¬ 
ministic  algorithm  it  is  preferable  to  start  with  a  small  0 
and  increase  it  gradually.  Large  values  of  beta  usually  de¬ 
grade  the  performance.  Wc  also  observed  that  slowly  in¬ 
creasing  0  during  the  iterations  improves  the  results  for 


1046 


II  I  I  IK  A  NS  ACTIONS  ON  ACOUSTICS.  Sfl  I  'H  AND  SIONAI  CKIKI  SSINO  VOl.  IK  NO  h  JIINI-  l«N<l 


the  stochastic  algoritlims.  It  should  be  noted  that  using  a 
larger  value  of  /3  for  the  deterministic  algorithm  (com¬ 
pared  to  those  used  in  the  stochastic  algorithms)  does  not 
improve  the  performance. 

The  nature  of  the  segmentation  results  depends  on  the 
order  of  the  label  model.  It  is  preferable  to  choose  the 
first-order  model  for  the  stochastic  algorithms  if  we  know 
a  priori  that  the  boundaries  are  cither  horizontal  or  ver¬ 
tical.  However,  for  the  deterministic  rule  and  the  learning 
scheme  the  second-order  model  results  in  more  homoge¬ 
neous  classification. 

The  MPM  algorithm  requires  the  statistics  obtained 
from  the  invariant  measure  of  the  Markov  chain  corre¬ 
sponding  to  the  sampling  algorithm.  Hence  it  is  preferable 
to  ignore  the  first  few  hundred  trials  before  starting  to 
gather  the  statistics.  The  performance  of  the  deterministic 
relaxation  rule  of  Section  III  also  depends  on  the  initial 
state  and  we  have  looked  into  two  different  initial  condi¬ 
tions.  The  first  one  starts  with  a  label  configuration  L  such 
that  L ,  =  /,  if  £/,(  Y*  |  /,)  =  min,4  {  U,(  Y*  (/*)}.  This 
corresponds  to  maximizing  the  probability  P(Y*  \  L) 
[23].  The  second  choice  for  the  initial  configuration  is  a 
randomly  generated  label  set.  Results  for  both  cases  are 
provided  and  we  observe  that  the  random  choice  often 
leads  to  better  results. 

In  the  examples  below  the  following  learning  parame¬ 
ters  were  used:  learning  rate  a  =  0.05  and  reward/penalty 
parameters  X,  =  1.0  and  X2  =  0.25. 

Example  1:  This  is  a  two-class  problem  consisting  of 
grass  and  calf  textures.  The  image  is  of  size  128  x  128 
and  is  shown  in  Fig.  2(a).  In  Fig.  2(b)  the  classification 
obtained  by  the  deterministic  algorithm  discussed  in  Sec¬ 
tion  III  is  shown.  The  maximum  likelihood  estimate  was 
the  initial  state  for  the  network,  and  Fig.  2(c)  gives  the 
result  with  random  initial  configuration.  Notice  that  in  this 
case  the  final  result  has  fewer  misclassified  regions  than 
in  Fig.  2(b)  and  this  was  observed  to  be  true  in  general. 
Parts  (d)  and  (e)  of  the  figure  give  the  MAP  solution  using 
simulated  annealing  and  the  MPM  solution  respectively. 
The  result  of  the  learning  algorithm  is  shown  in  Fig.  2(f) 
and  there  are  no  misclassifk  Jtions  within  the  homoge¬ 
neous  regions.  However  the  boundary  is  not  as  good  as 
those  of  the  MAP  or  MPM  solutions.  In  all  the  cases  we 
used  P  =  0.6. 

Example  2:  This  is  a  256  X  256  image  (Fig.  2(a))  hav¬ 
ing  six  textures:  calf,  grass,  wool,  wood,  pig  skin,  and 
sand.  This  is  a  difficult  problem  in  the  sense  that  three  of 
the  textures  (wool,  pig  skin,  and  sand)  have  almost  iden¬ 
tical  characteristics  and  are  not  easily  distinguishable, 
even  by  the  human  eye.  The  maximum  likelihood  solution 
is  shown  in  Fig.  3(b),  and  part  (c)  of  the  figure  is  the 
solution  obtained  by  the  deterministic  relaxation  network 
with  the  result  in  part  (b)  as  the  initial  condition.  Fig.  3(d) 
gives  the  result  with  random  initial  configuration.  The 
MAP  solution  using  simulated  annealing  is  shown  in  part 
(c).  As  was  mentioned  in  Section  IV-A,  cycling  of 


temperature  improves  the  performance  of  simulated  an¬ 
nealing.  The  segmentation  result  ws  obtained  by  starting 
with  an  initial  temperature  T„  =  2.0  and  cooling  accord¬ 
ing  to  the  schedule  (16)  for  300  iterations.  Then  the  sys¬ 
tem  was  reset  to  T()  =  1.5  and  the  process  was  repeated 
for  300  more  iterations.  In  the  case  of  the  MPM  rule  the 
first  500  iterations  were  ignored  and  Fig.  3(f)  shows  the 
result  obtained  using  the  last  200  iterations.  As  in  the  pre¬ 
vious  example,  the  best  results  were  obained  by  the  sim¬ 
ulated  annealing  and  MPM  Igorithms.  For  the  MPM  case 
there  were  no  misclassifications  within  homogeneous  re¬ 
gions  but  the  boundaries  were  not  accurate.  In  fact,  as 
indicated  in  Table  I,  simulated  annealing  has  the  lowest 
percentage  error  in  classification.  Introducing  learning  in 
deterministic  relaxation  considerably  improves  the  per¬ 
formance  (Fig.  3(g)).  Table  I  gives  the  percentage  clas¬ 
sification  error  for  the  different  cases. 

It  is  noted  from  the  table  that  although  learning  im¬ 
proves  the  performance  of  the  deterministic  network  al¬ 
gorithm,  the  best  results  were  obtained  by  the  simulated 
annealing  technique,  which  is  to  be  expected. 


A.  Hierarchical  Segmentation 


The  various  segmentation  algorithms  described  in  the 
previous  sections  can  be  easily  extended  to  hierarchical 
structures  wherein  the  segmentation  is  carried  out  at  dif¬ 
ferent  levels— from  coarse  to  fine.  The  energy  functions 
are  modified  to  take  care  of  the  coupling  between  the  ad¬ 
jacent  resolutions  of  the  system.  Consider  a  K- stage  hi¬ 
erarchical  system,  with  stage  0  representing  the  maximum 
resolution  level  and  stage  K  -  1  being  the  coarsest  level. 
The  energy  corresponding  to  the  Arth  stage  is  denoted  by 
U\(s,  l)  and  U\(s)  (eqs.  (3)  and  (4)).  The  size  of  the 
window  used  in  computing '  the  joint  energy  potential 
{/,(•)  increases  with  the  index  k.  The  potential  U2  is 
modified  to  take  care  of  the  coupling  as  follows: 


U\(s)  =  ~P  2  6{Ll(s)  -  Lk(t')) 

.  ef>; 

+  -  £/♦'(*)))  +  "(L(s)l 

0  <  k  <  K  -  I  (21 ) 

where  Lk  ( s )  is  the  label  for  the  site  s  in  the  fcth  stage,  and 
Pl  is  the  coupling  coefficient  between  the  stages  k  +  I 
and  k.  Di(s)  is  the  appropriate  neighborhood  set  for  the 
/tth  stage.  The  result  of  segmentation  on  the  six-class 
problem  with  K  =  2  and  using  the  learning  algorithm  is 
shown  in  Fig.  3(h). 


B.  Conclusion 

In  this  paper  we  have  looked  into  different  texture  seg¬ 
mentation  algorithms  based  on  modeling  the  texture  in¬ 
tensities  as  a  GMRF.  It  is  observed  that  a  large  class  of 
natural  textures  can  be  modeled  in  this  way.  The  perfor- 
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Fig.  2.  (a)  Original  image  consisting  of  (wo  textures.  The  classification  using  different  algorithms  is  shown  in  (b)-(f).  (The 
textures  are  coded  by  gray  levels.)  (b)  Deterministic  relaxation  with  maximum  likelihood  solution  as  initial  condition  and  (c) 
with  random  initial  condition,  (d)  MAP  estimate  using  simulated  annealing,  (c)  MPM  solution,  (f )  Network  with  stochastic 
learning. 


mancc  of  several  algorithms  for  texture  segmentation  is 
studied.  The  stochastic  algorithms  obtain  nearly  optimal 
results,  as  can  be  seen  from  the  examples.  We  noted  that 
the  MRF  model  helps  us  to  trivially  map  the  optimization 
problem  onto  a  Hopfield-type  neural  network  This  deter¬ 
ministic  relaxation  network  converges  extremely  fast  to  a 
solution,  typically  in  20-30  iterations  for  the  256  X  256 
image.  Its  performance,  however,  is  sensitive  to  the  ini¬ 
tial  state  of  the  system  and  often  is  nol  very  satisfactory. 


To  overcome  the  disadvantages  of  the  network,  a  new  al¬ 
gorithm.  which  introduces  stochastic  learning  into  the  it¬ 
erations  of  the  network,  was  proposed.  This  helps  to 
maintain  a  sustained  search  of  the  solution  space  while 
learning  from  the  past  experience.  Thi,  algorithm  com¬ 
bines  the  advantages  of  deterministic  and  stochastic  relax¬ 
ation  schemes  and  ii  would  be  interesting  to  explore  its 
performance  in  solving  other  computationally  hard  prob¬ 
lems  in  computer  vision. 
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J;ig.  3.  (a)  Original  image  consisting  of  six  textures,  (h)  Maximum  likelihood  solution,  (c)  Deterministic  relaxation  with  (b)  as 
initial  condition  and  (d)  with  random  initial  condition,  (c)  MAP  estimate  using  simulated  annealing.  (()  MPM  solution,  (g) 
Network  with  stochastic  learning,  (h)  Hierarchical  network  solution 
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Algorithm 

Percentage  Error 

Maximum  Likelihood  Estimate 

22.17 

Neural  network  (MLK  as  initial  state) 

16.25 

Neural  network  (Random  initial  state) 

14.74 

Simulated  annealing  (MAP) 

6.72 

MPM  algorithm 

7.05 

Neural  network  with  learning 

8.7 

Hierarchical  network 

8  21 
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A  Note  on  Unsupervised  Texture  Segmentation  1 

B.S.  Manjunath  and  R.  Chellappa 

Department  of  EE-Systems 
University  of  Southern  California 
Los  Angeles,  California  90089 
Abstract 

We  consider  the  problem  of  unsupervised  segmentation  of  textured  images.  The  only 
explicit  assumption  made  is  that  the  intensity  data  can  be  modeled  by  a  Gauss  Markov 
Random  Field  (GMRF).  The  image  is  divided  into  number  of  non-overlapping  regions 
and  the  GMRF  parameters  are  computed  from  each  of  these  regions.  A  simple  clustering 
method  is  used  to  merge  these  regions.  The  parameters  of  the  model  estimated  from  the 
clustered  segments  are  then  used  in  two  different  schemes,  one  being  an  approximation  to 
the  maximum  aposteriori  estimate  of  the  labels  and  the  other  minimizing  the  percentage 
misclassification  error.  Our  approach  is  contrasted  with  a  recently  published  algorithm 
[1]  which  detailed  an  interesting  simultaneous  parameter  estimation  and  segmentation 
scheme.  We  compare  the  results  of  the  adaptive  segmentation  algorithm  in  [1]  with  a  simple 
nearest  neighbor  classification  scheme  to  show  that  if  enough  information  is  available, 
simple  techniques  could  be  used  as  alternatives  to  computationally  expensive  schemes. 

1  Partially  supported  by  the  AFSOR  grant  86-0196. 
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1  Introduction 


Segmenting  a  textured  scene  into  different  classes  in  the  absence  of  apriori  information  is 
still  an  unsolved  issue  in  computer  vision.  The  main  difficulty  is  that  the  model  and  its  pa¬ 
rameters  are  unknown  and  need  to  be  computed  from  the  given  image  before  segmentation. 
To  compute  the  parameters  effectively  the  segmented  image  itself  is  needed  !  Simultaneous 
parameter  estimation  and  segmentation  is  often  computationally  prohibitive.  An  alternate 
approach  to  this  problem  is  to  have  a  two  step  process,  first  estimating  the  parameters  in 
small  regions  and  getting  a  crude  segmentation.  Then  estimate  the  parame*°»-s  again  from 
this  segmented  image  and  use  pixel  based  segmentation  schemes  ([2], [3]).  lu  this  paper  we 
assume  that  the  texture  intensity  distribution  can  be  modeled  by  a  second  order  GMRF. 
Hence  the  problem  is  in  estimating  these  GMRF  parameters  and  segmenting  the  textures 
based  on  the  estimated  values.  * 

Unsupervised  texture  segmentation  is  not  a  new  problem.  Some  of  the  recent  work  has 
been  reported  in  [l],  [4]  and  [5].  Lakshmanan  and  Derin  [l]  in  a  recent  paper  address  the 
problem  of  simultaneous  estimation  and  segmentation  of  Gibbs  Random  Fields  (GRF). 
They  obtained  an  interesting  convergence  result  for  the  Maximum  Likelihood  Estimates 
(MLE)  of  the  parameters  and  maximum  aposteriori  probability  (MAP)  solution  for  the 
segmentation.  We  give  a  brief  description  of  their  model  in  section  5  and  experimental 
results  to  illustrate  that  if  one  makes  the  same  assumptions,  a  simple  nearest  neighbor 
classification  rule  produces  results  very  close  to  those  obtained  using  simulated  annealing 
as  in  [1].  In  [4],  no  specific  texture  model  is  assumed.  Certain  features  are  extracted  from 
the  sub-images  and  the  image  is  segmented  based  on  the  disparity  measure  between  the 
feature  vectors  from  different  sub-images. 

The  approach  to  texture  segmentation  presented  here  is  similar  to  the  work  of  Cohen 
and  Fan  [5].  In  [5]  the  textures  are  modeled  as  second  order  GMRF  and  the  texture 
parameters  are  estimated  from  disjoint  windows.  The  windows  are  later  grouped  based 
on  clustering  analysis.  Finer  segmentation  is  obtained  by  using  the  parameters  from  the 


2 


coarse  segmentation  in  a  suitable  relaxation  algorithm  [6]. 

In  the  next  section  we  give  a  brief  description  of  the  texture  model.  Section  3  details 
the  segmentation  scheme  and  the  experimental  results  are  provided  in  section  4.  In  section 
5,  the  adaptive  segmentation  scheme  of  [l]  is  discussed  along  with  the  results  of  a  simple 
nearest  neighbor  classification  scheme. 


2  Texture  model 

The  GMRF  model  for  textures  has  been  used  by  many  researchers  (7j.  In  this  paper 
we  consider  a  second  order  GMRF  model  for  the  conditional  probability  density  of  the 
intensity  given  the  texture  label. 

Let  17  denote  the  set  of  grid  points  in  the  M  x  M  lattice,  i.e.,  17  =  {(*,  j)  ,  1  <  i,j  <  M}. 
Let  {Ls  ,  s  G  17}  and  (F,  ,  s  €  }  denote  the  labels  anci  zero  mean  gray  level  arrays 

respectively.  Let  Ns  be  the  symmetric  second  order  neighborhood  of  a  site  s  (consisting 
of  the  8  nearest  neighbors  of  s).  Then  assuming  that  all  the  neighbors  of  s  also  have  the 
same  label  as  that  of  s,  we  can  write  the  following  expression  for  the  conditional  density 
of  the  intensity  at  the  pixel  site  s : 


P(Ys  =  y,\Yr  =  yr,reN„Ls  =  l)  = 


,-U(Y,=y.  |  Yr=yr.r€N.,L,=l) 


Z(l\yr,reNa) 

where  Z(l\yr,r  €  Ns)  is  the  partition  function  of  the  conditional  Gibbs  distribution  and 


(1) 


V(Y.  =  y.  |  Y,  =  y,,  r  6  N„  L,  =  t)  =  -  2  £  &‘,„y,y,) 


(2) 


rtN, 


In  (2),  <t t  and  Ql  are  the  GMRF  model  parameters  of  the  /- th  texture  class.  The  model 
parameters  satisfy  Qlr  s  =  =  0/,_r  =  0^.. 

Further,  the  joint  probability  in  a  window  Ws  centered  at  s  can  be  written  as, 


p(y;\l,  =  /)  = 


e-l/,(y;|  L.=l) 

Zi(l) 
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(3) 


where  Z\(l)  is  the  partition  function  and 

Ul(y’\L,  =  l)  =  J2  lyl  -  Y1  e'r^r(j lr+r  +  Jfr-r) 

L  al  r ew,  [  r eN'\r+T£W. 

y*  represents  the  intensity  array  in  the  window  Wt.  The  above  equation  assumes  a  free 
boundary  model.  N ’  is  a  set  of  shift  vectors  corresponding  to  the  second  order  GMRF 
model, 

=  {(0,1), (1,0), (1,1), (-1,1)}  (4) 

2.1  GMRF  Parameter  Estimation 

There  are  many  existing  methods  for  estimating  the  GMRF  parameters,  but  none  of  them 
can  guarantee  both  consistency  (estimates  converging  to  the  true  values  of  the  parameters) 
and  stability  (the  covariance  matrix  in  the  expression  for  the  joint  probability  density  of 
the  MRF  must  be  positive  definite)  together.  Normally  an  optimization  algorithm  is  run 
to  obtain  the  stable  estimates.  Here  we  consider  the  least  square  estimates  of  the  GMRF 
parameters  [8].  Since  our  main  interest  is  in  obtaining  reasonably  good  measures  to  aid  the 
segmentation  process  and  not  in  synthesizing  the  textures,  we  do  not  check  for  the  stability 
of  the  estimates  obtained.  One  can  instead  use  the  maximum  likelihood  estimates  [9],  but 
it  is  computationally  more  expensive.  Consider  a  region  of  size  N  x  N  containing  a  single 
texture.  Let  ft  be  the  lattice  under  consideration  and  let  ft/  be  the  interior  region  of  ft, 
i.e., 

ft/  =  ft  —  ftfl,  ftB  =  {s  =  (z,  j),  s  6  ft  and  s  ±  r  §  ft  for  at  least  some  r  €  A^*}  (5) 

Let 

Qa  =  [ya+ri  +  Z/a-T1(, »  J/»+r,  "b  y»-T4]  (6) 

Then  the  least  square  estimates  of  the  parameters  are 

©  =  E  Q.Qj]-lt£Q»y-\ 

0/  0/ 
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(8) 


*2  =  jp  Ely*  -  0r<2*]2 

ft/ 

If  (*  is  the  mean  of  the  subimage,  then  the  feature  vector  for  the  region  is  denoted  by 

F  =  (/„/„/>, /,,/„/«)  =  (*1,0*, I M.,!*,*1)  (9) 

Label  field  :  The  label  field  is  modeled  as  a  second  order  discrete  MRP.  It  does  not 
play  any  role  in  parameter  estimation  or  in  obtaining  the  initial  coarse  segmentation.  The 
label  field  is  characterized  by  a  single  parameter  (5  which  determines  the  bonding  between 
different  regions  in  the  image. 


3  Segmentation 


3.1  Clustering  * 

The  given  image  is  divided  in  to  a  number  of  non-overlapping  subimages.  For  each  of  these 
subimages  the  corresponding  feature  vectors  are  estimated  as  described  in  the  previous 
section.  It  is  assumed  that  all  these  subimages  are  homogeneous.  A  normalized  Eucledian 
distance  measure  is  defined  for  these  vectors  as 

F)=yJ/WiL 

’  k  ui? + (fi)2 

A  simple  clustering  is  done  based  on  this  distance  measure.  First  the  maximum  distance 
between  any  two  regions  in  the  image  is  found  as 

dmax  ~~  ma  ,xd{Fi,F3) 

The  regions  are  now  grouped  such  that  any  two  subimages  i  and  j  belonging  to  the  same 
class  satisfy 

d(F',F>)<p  dmax  (11) 

where  p  is  a  clustering  parameter.  Since  p  affects  the  number  of  clusters  that  are  formed,  a 
good  guess  of  p  should  be  based  on  the  knowledge  about  the  approximate  number  of  classes 


d(Fi 
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present  in  the  image.  In  our  experiments  we  used  a  simple  heuristic  p  =  I /(approximate 
number  of  classes).  In  the  above  clustering  process  all  isolated  regions  are  marked  as 
ambiguous.  Also  all  regions  which  satisfy  the  criterion  (11)  for  two  different  classes  should 
be  labelled  ambiguous.  Usually  the  boundary  regions  which  have  more  than  one  texture 
inside  fall  into  this  class.  Note  that  alternate  schemes  like  k- mean  clustering  can  also  be 
used  in  obtaining  such  a  coarse  segmentation. 

From  these  clustered  regions  the  parameters  are  recomputed.  These  parameters  are 
then  used  in  pixel  based  segmentation  algorithms  [3]  to  obtain  finer  segmentation. 

3.2  Deterministic  and  stochastic  algorithms 

3.2.1  Deterministic  relaxation 

Assuming  that  the  parameters  of  the  model  and  the  number  of  classes  in  the  image  are 
known,  the  texture  segmentation  problem  can  be  formulated  as  a  minimization  problem. 
Further,  for  the  case  when  the  model  is  a  MRF,  mapping  this  problem  on  to  a  relaxation 
network  is  straightforward.  The  function  to  be  minimized  can  be  written  as  [3] 

B  =  ;££ v(,,  1)V„ -  f  £  £  £  v'<v‘‘  (12) 

where  N,  is  the  second  order  neighborhood  of  site  s  and  {V»i}  are  variables  taking  on 
values  from  {0,1}.  If  Vai  is  1,  it  indicates  that  the  site  s  belongs  to  class  l.  Note  that 
for  each  s,  only  one  Val  has  a  value  one  and  all  others  are  zero.  /?  represents  the  binding 
between  textures  of  the  same  class  and  characterizes  the  initial  distribution  of  the  class 
labels.  U(s,l)  includes  all  the  information  regarding  the  intensity  and  parameter  values 
for  the  site  s  €  class  /.  It  gives  a  measure  of  the  joint  distribution  of  the  intensities  in  a 
small  window  Ws  centered  at  s  and  for  the  case  when  all  pixels  inside  the  window  belong 
to  class  /,  it  is  given  by 


u(sj)  =  w(i)  +  u,(Y;,i) 


(13) 
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'  here  U\(.)  is  as  in  (3)  and  tu(l)  is  the  bias  corresponding  to  class  l  [3j.  This  bias  can 
be  estimated  from  the  given  data  as  we  have  a  coarse  initial  segmentation  to  begin  with. 
N *  is  the  set  of  shift  vectors  corresponding  to  second  order  GMRF  model  as  in  (4).  Be¬ 
fore  starting  the  relaxation,  we  can  selectively  fix  the  labels  of  the  pixels  from  which  the 
parameters  are  initially  estimated,  so  that  the  relaxation  process  can  be  faster. 

During  each  visit  to  site  s,  the  class  corresponding  to  the  lowest  energy  E  is  selected. 
This  is  equivalent  to  setting  the  appropriate  Val  to  1.  The  process  is  repeated  till  there 
is  no  change  in  the  energy  E,  i.e.  till  convergence.  It  can  be  shown  that  this  relaxation 
converges  to  a  solution,  which  may  not  be  the  best  always.  This  algorithm  is  similar  to 
the  iterated  conditional  mode  rule  proposed  by  Besag  [l 0]. 

3.2.2  Stochastic  algorithms 

i 

The  alternative  to  deterministic  relaxation  is  to  update  the  class  labels  in  a  random  way. 
Simulated  annealing  can  be  used  to  get  the  MAP  solution  [2].  Here  we  consider  another 
criterion  which  minimizes  the  expected  classification  error  per  pixel  (or  alternatively,  max¬ 
imizes  the  posterior  marginal  distribution)  and  use  the  algorithm  suggested  in  [11]  for  this. 
This  algorithm  is  equivalent  to  running  simulated  annealing  at  a  fixed  temperature  T  =  1 
(i.e.,  no  annealing  )  and  for  details  we  refer  to  [3].  The  final  labels  chosen  correspond  to  the 
most  frequently  selected  ones.  For  convenience  we  refer  to  this  as  the  MPM  (Maximizing 
the  Posterior  Marginal)  algorithm  in  the  following.  We  also  implemented  an  algorithm 
which  combines  the  deterministic  relaxation  with  stochastic  learning  [3],  This  has  an  ad¬ 
vantage  that  it  requires  fewer  number  of  iterations  compared  to  simulated  annealing  and 
the  results  are  better  than  using  the  deterministic  relaxation  alone.  Learning  is  introduced 
by  defining  a  probability  distribution  over  the  class  labels  at  each  pixel  site  and  these  prob¬ 
abilities  are  updated  at  each  convergence  of  the  deterministic  relaxation.  A  new  starting 
state  for  the  relaxation  is  obtained  by  sampling  from  this  updated  probability  distribution 
and  the  process  is  repeated.  Usually  about  20-40  such  learning  cycles  are  enough  to  get 
good  results. 
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4  Experimental  Results 


In  the  experiments  described  below,  the  subimage  size  was  chosen  to  be  32x32.  The 
value  of  the  clustering  parameter  depends  on  the  number  of  texture  classes  present  and 
as  mentioned  earlier  we  used  the  heuristic  p~  1  /(approximate  number  of  classes).  To 
eliminate  very  small  isolated  regions  one  can  use  a  penalty  function  in  the  relaxation 
algorithm  which  prohibits  small  clusters  from  being  formed  [4).  We  found  it  convenient 
to  use  a  smoothing  filter  (size  3x3  or  5x5)  to  do  the  same.  An  useful  observation  is 
that  with  this  kind  of  “post-processing”,  the  performance  of  both  the  deterministic  and 
stochastic  relaxation  algorithms  is  comparable.  The  results  given  below  correspond  to 
those  obtained  after  performing  the  smoothing.  However  the  boundaries  obtained  by  the 
stochastic  algorithms  are  more  accurate. 

Example  1:  (Grass  and  leather  texture)  Figure  1  shows  this  mosaic  and  is  a  128  x  128 
image  and  p  —  0.5.  Figure  1(b)  shows  the  result  of  coarse  clustering  described  in  section 
3.1.  Figure  1(c)  is  the  result  of  the  deterministic  relaxation.  This  normally  takes  about 
10-20  iterations.  The  result  of  using  learning  in  the  deterministic  relaxation  is  shown  in 
Figure  1(d).  About  10  learning  cycles  are  used  in  this  experiment.  Figure  1(e)  gives  the 
result  for  the  MPM  algorithm  after  about  500  iterations.  The  boundary  obtained  by  the 
MPM  is  the  most  accurate  and  also  there  are  no  misclassifications  inside  the  homogeneous 
regions. 

Example  2:  (Grass,  Raffia  and  Wood)  Figure  2(a)  shows  this  image  and  Figure  2(b) 
gives  the  coarse  clustering  obtained  using  p  =  0.3.  Note  the  presence  of  an  ambiguous 
region  (dark  region  at  the  top),  which  could  not  be  classified  into  any  of  the  other  classes. 
The  results  of  the  various  algorithms  are  shown  in  Figure  2(c)-2(e).  Here  again  the  MPM 
gave  the  best  result. 
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5  Comments  on  an  adaptive  segmentation  algorithm 


In  [l],  a  simpler  model  based  on  GRF  is  used  to  model  the  intensity  process.  This  model 
can  be  summarized  as: 

+  wt]  (H) 

where  Y,j  is  the  observed  intensity  at  location  (i,  j),  X,j  is  the  true  intensity  and  is 
an  independent  identically  distributed  zero  mean  gaussian  noise  and  it  is  assumed  that  its 
variance  is  known.  Further,  Xij  €  {sj, . . . ,  syv},  s;  being  the  intensity  of  the  i-th  region, 
and  N  are  assumed  to  be  known.  The  process  X  is  modeled  as  a  MRF  taking  one  of 
these  N  values.  The  joint  distribution  of  X  can  be  written  as  a  Gibbs  distribution  and  the 
particular  form  of  this  used  in  [1]  is  called  a  Multilevel  Logistic  (MLL)  distribution.  Hence 

the  parameters  correspond  to  this  MLL  distribution.  A  maximum  likelihood  estimate 

4 

of  the  parameters  of  the  MLL  distribution  are  computed  and  combined  with  simulated 
annealing  to  obtain  an  optimum  segmentation  of  the  scene.  A  convergence  result  is  also 
proved  for  this  adaptive  segmentation  scheme. 

In  the  analysis  of  the  algorithm,  the  assumptions  made  play  an  important  role.  Even 
with  these  simple  assumptions,  due  to  computational  difficulties  further  approximations 
have  to  be  made.  For  example  in  the  above  scheme,  a  pseudo-likelihood  algorithm  is  used 
to  approximate  the  MLEs  to  avoid  the  computational  burden  involved  in  the  estimation 
of  MLE.  The  use  of  simulated  annealing  makes  the  algorithm  computationally  demanding. 
Further,  if  any  of  the  assumptions  made  above  (eg.,  known  number  of  regions,  their  inten¬ 
sity  values  or  known  noise  parameters)  are  relaxed  [12],  the  resulting  convergence  may  not 
be  even  to  a  local  optimum.  Thus,  even  though  the  principle  of  simultaneous  parameter 
estimation  and  segmentation  could  be  used  in  more  general  cases  like  the  textured  images 
considered  in  this  paper,  it  is  not  clear  if  it  has  any  advantages  compared  to  the  scheme 
detailed  in  this  paper  where  we  first  estimate  the  parameters  from  windows  to  obtain  a 
coarse  segmentation  and  then  use  pixel  based  schemes  for  finer  segmentation. 
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5.1  A  simple  nearest-neighbor  classification  scheme 

Adaptive  segmentation  should  be  data  driven,  but  at  the  same  time  we  should  make  use  of 
whatever  information  that  is  available  about  the  data  in  the  design  of  such  algorithms.  For 
example  in  this  paper  we  made  an  assumption  that  the  texture  intensities  could  be  modeled 
by  GMRF  which  simplified  the  parameter  estimation  significantly.  To  further  illustrate  the 
usefulness  of  the  prior  knowledge  about  data,  we  give  below  a  simple  classification  scheme 
which  makes  the  same  assumptions  as  in  the  adaptive  segmentation  scheme  of  [l]  and  does 
not  need  expensive  algorithms  like  simulated  annealing  to  obtain  comparable  results.  The 
data  is  the  same  as  the  one  used  in  [l].  Also  the  information  about  the  noise  variance  is 
not  used  in  this  segmentation  operation.  For  obtaining  the  segmented  image  from  a  noisy 
version  of  it,  we  used  the  following  algorithm: 

A  ■ 

1.  At  each  pixel  site  ( i,j ),  compute  the  average  in  a  small  window  (of  size  3x3  in 
our  case)  around  the  pixel  (z,  j) 

2.  Then  the  intensity  of  the  pixel  is  estimated  as: 

Xij  ~  sk  =  min  |s,  -  (15) 

3.  Smooth  the  resulting  texture  by  using  a  smoothing  filter  (similar  to  the  one  described 
in  the  previous  section). 

Figure  3  shows  the  performance  of  this  scheme  on  a  two  region  hand  drawn  image. 
Figure  3(a)  is  the  original  image  with  the  two  intensity  levels  being  100  and  150.  This  is 
one  of  the  images  used  in  [1].  Figure  3(b)  is  the  noisy  version  with  the  noise  being  additive 
i.i.d.  zero  mean  gaussian  with  standard  deviation  25  (Signal  to  noise  ratio  (SNR)  of  2). 
The  classification  result  we  obtained  is  shown  in  Figure  3(c)  with  the  classification  error 
of  1.73%.  Figures  3(d)  and  3(e)  show  the  results  when  the  noise  deviation  is  50  (SNR  1). 

Corresponding  results  for  the  four  region  case  (with  intensity  values  100,150,200  and 
250)  are  shown  in  Figure  4.  The  maximum  difference  in  the  performance  of  the  nearest 
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Algorithm 

2  Regions 

4  Regions 

SNR  2 

SNR  1 

SNR  2 

SNR  1 

Adaptive  Segmentation  from  [1] 

0.96 

3.88 

0.40 

1.98 

Simple  classification  scheme  described  here 

1.73 

4.60 

2.21 

3.19 

Table  1:  Comparison  of  the  adaptive  segmentation  algorithm  in  [l]  with  the  nearest  neigh¬ 
bor  classification  scheme.  The  numbers  indicate  percentage  classification  error.  SNR  2 
corresponds  to  a  noise  standard  deviation  of  25  and  SNR  1  corresponds  to  a  deviation  of 
50. 

neighbor  classification  rule  to  that  reported  in  [l]  is  for  the  four  region  case  with  SNR  2, 
where  we  obtained  an  error  of  2.21%  compared  to  0.4%  reported  in  [l].  Table  1  compares 
the  performance  of  this  nearest  neighbor  classification  scheme  with  the  adaptive  segmen¬ 
tation  algorithm  of  [1].  As  far  as  the  computation  time  required,  this  clustering  technique 
takes  few  seconds  of  CPU  time  (for  the  128x128  images,  on  a  SUN-3  workstation)  compared 
to  15-30  minutes  (on  VAX  8600)  reported  in  [1]. 

As  can  be  seen  from  these  experiments,  complexity  of  the  segmentation  algorithms  can 
be  greatly  reduced  by  a  proper  use  of  prior  information  about  the  assumed  models.  The 
texture  model  considered  in  section  2  is  more  complicated  than  the  one  discussed  here  and 
the  only  explicit  assumption  made  was  that  the  textures  can  be  modeled  by  a  second  order 
GMRF.  Depending  on  the  textures,  this  may  or  may  not  be  a  valid  assumption.  However 
from  our  experience  with  different  real  textures  like  wood,  wool,  water  etc.,  this  appears 
to  be  a  good  approximation  and  our  experimental  results  also  support  this  fact.  We  also 
find  it  better  to  separate  the  estimation  stage  from  the  segmentation  stage.  However,  by 
separating  estimation  and  segmentation,  the  algorithm  will  not  lend  itself  to  easy  analysis. 
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6  Conclusions 


Unsupervised  segmentation  is  a  difficult  problem.  Even  before  estimating  the  parameters 
of  any  assumed  model,  one  has  to  decide  whether  the  model  is  applicable  to  a  particular 
image  or  not.  As  we  observed  in  the  previous  section,  sometimes  the  choice  of  appropriate 
models  play  a  significant  role.  We  feel  that  separating  estimation  stage  from  segmentation 
simplifies  the  problem  and  enables  computationally  manageable  algorithms.  In  cases  where 
the  number  of  textures  in  an  image  is  reasonably  small,  we  are  able  to  estimate  the  model 
parameters  and  segment  the  scene.  We  also  noticed  that  by  introducing  a  penalty  for 
the  small  regions,  which  is  equivalent  to  doing  simple  smoothing  operations,  deterministic 
relaxation  schemes  give  results  comparable  to  those  of  stochastic  techniques  like  simulated 
annealing  and  MPM. 
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Figure  2:  Unsupervised  segmentation  of  an  image  having  three  textures 
(Grass,  Raffia  and  Wood),  (a)  Original  image  (b)  Coarse  clustering.  Note  the 
presence  of  an  ambiguous  region  (darkest  region  at  the  top)  (c)  Deterministic 
relaxation  (d)  with  learning  (e)  MPM  result 


(D)  (E) 


Figure  3:  Segmentation  of  two  region  hand  drawn  image  using  a  nearest 
neighbor  classification  rule,  (a)  original  image.  This  is  the  same  as  the  one 
used  in  [l].  (b)  with  SNR  2  (standard  deviation  25)  (c)  segmented  image 
from  (b),  (d)  with  SNR  1  (standard  deviation  50)  (e)  segmented  image  from 
(<!)■ 
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Figure  4:  Segmentation  of  a  four  region  hand  drawn  image  using  a  nearest 
neighbor  classification  algorithm,  (a)  original  image,  same  as  the  one  used  in 
[1].  (b)  with  SNR  2  (standard  deviation  25)  (c)  segmented  image  from  (b), 
(d)  with  SNR  1  (standard  deviation  50)  (e)  segmented  image  from  (d). 
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Abstract 

Motion  stereo  method  infers  depth  information  from  a  se¬ 
quence  of  image  frames.  Both  batch  and  recursive  neural  net¬ 
work  algorithms  for  motion  stereo  are  presented.  A  discrete 
neural  network  is  used  for  representing  the  disparity  field. 

The  batch  algorithm  first  integrates  information  from  all  im¬ 
ages  by  embedding  them  into  the  bias  inputs  of  the  network. 
Matching  is  then  carried  out  by  neuron  evaluation.  This  al¬ 
gorithm  implements  the  matching  procedure  only  once  unlike 
conventional  batch  methods  requiring  matching  many  times. 

Existing  recursive  approaches  use  either  a  Kalman  filter  or 
recursive  least  square  algorithm  to  update  the  disparity  values. 
Due  to  the  unmeasurable  estimation  error,  the  estimated  dis¬ 
parity  values  at  each  recursion  are  unreliable,  yielding  a  noisy 
disparity  field.  Instead,  our  method  uses  a  recursive  least  square 
algorithm  to  update  the  bias  inputs  of  the  network.  The  dispar¬ 
ity  values  are  uniquely  determined  by  the  neuron  states  after 
matching.  Since  the  neural  network  can  be  run  in  parallel  and 
the  bias  input  updating  scheme  can  be  executed  on  line,  a  real 
time  vision  system  employing  such  an  algorithm  is  very  attrac- 
tived.  A  detection  algorithm  for  locating  occluding  pixels  is  also 
included.  Experimental  results  using  natural  image  sequences 
are  given. 

1  Introduction 

Motion  stereo  is  a  method  for  deriving  depth  information  from 
either  a  moving  camera  or  objects  moving  through  a  stationary 
3-D  environment.  Since  motion  stereo  uses  more  than  two  im¬ 
age  frames,  it  usually  gives  more  accurate  depth  measurements 
than  static  stereo  which  uses  only  two  image  frames.  Applica¬ 
tions  of  motion  stereo  are  in  ALV  project  and  industrial  robot 
vision  systems.  In  this  paper  we  present  two  neural  network 
algorithms,  batch  and  recursive,  for  computing  disparities  using 
a  sequence  of  image  frames  based  on  the  first  order  intensity 
derivatives  and  chamfer  distance  values.  The  chamfer  distance 
value  is  defined  as  the  distance  from  the  non-edge  pixel  to  the 
nearest  edge  pixel  [1], 

A  substantial  amount  of  work  has  been  devoted  to  methods 
for  computing  the  disparity  field  based  on  a  sequence  of  images. 
In  a  relatively  early  paper,  Ncvatia  (2]  uses  multiple  views  be- 
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F-49620-87-C-0007  and  the  AFOSR  Grant  No.  86-0196. 


tween  two  stereo  views  to  achieve  certain  accuracy  without  an 
increase  in  search  time.  The  object  is  placed  on  a  turntable  and 
multiple  views  are  taken  by  a  camera  every  0.5  degree  apart. 
Two  simple  methods  are  suggested  for  region  search.  Both 
methods  only  reduce  the  search  range  but  not  increase  the  res¬ 
olution.  Also  they  do  not  make  use  of  any  information  acquired 
in  the  previous  view  for  the  next  search  procedure.  Williams 
[3]  presents  a  new  approach  for  deriving  depth  from  a  moving 
camera  in.  By  moving  the  camera  forward,  the  disparity  is  esti¬ 
mated  using  simple  triangulation.  For  simplicity,  all  the  object 
surfaces  are  assumed  to  be  fiat  and  oriented  in  either  horizon¬ 
tal  direction  i.e.  parallel  to  image  plane  or  vertical  direction 
i.e.  parallel  to  ground  plane.  Therefore,  only  the  distances  for 
horizontal  surface  and  the  height  for  vertical  surface  need  to  be 
found.  To  achieve  subpixel  accuracy,  an  image  is  interpolated 
according  to  the  predicted  disparity  values  obtained  by  a  search 
process  and  occlusion  effects.  Based  on  the  error  between  real 
and  interpolated  images,  the  correct  orientation  of  each  surface 
can  be  detected,  and  hence  a  segmented  image  connsting  of  re¬ 
fined  synthetic  surfaces  can  be  obtained.  For  implementation 
purposes,  an  iterative  segmentation  procedure  is  employed  and 

the  systematic  changes  of  distance  and  height  embodied  in  syn¬ 
thetic  segmented  image  at  each  iteration  are  used  for  finding 
the  correct  distance  and  height.  Experimental  results  demon¬ 
strate  the  usefulness  of  this  approach  for  simple  natural  image 
sequences.  Since  only  planar  surfaces  with  one  of  two  different 
orientations  are  assumed  to  exist  in  natural  images,  areas  cor¬ 
responding  to  either  non- planar  surfaces  or  planar  surfaces  with 
other  orientations  are  not  correctly  interpolated  and  therefore 
the  estimated  distances  and  heights  for  these  areas  are  not  re¬ 
liable.  Furthermore,  this  approach  requires  information  about 
the  focus  of  expansion  (FOE)  and  the  final  result  depends  very 
much  on  the  quality  of  initial  segmentation. 

Instead  of  computing  depth  in  image  space,  Jain,  Bartlett 
and  O’Brien  (4)  developed  a  method  for  estimating  the  depth  of 
feature  points  (corners)  in  the  ego-motion  complex  logarithmic 
mapping  (ECLM)  space.  They  showed  that  the  axial  movement 
of  the  camera  causes  only  horizontal  but  not  the  vertical  change 
in  the  mapping  of  image  points.  Therefore,  the  depth  of  a  fea¬ 
ture  point  can  be  determined  from  the  horizontal  displacement 
in  the  ECLM  for  that  point  and  from  the  camera  velocity  in  the 
gaze  direction.  However,  the  mapping  is  very  sensitive  to  noise, 
spatial  quantization  error  and  image  blur,  requiring  some  heuris¬ 
tics  to  establish  the  correspondence  of  points,  such  as  thresholds 
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•for  maximum  possible  changes  in  the  vertical  direction  and  an 
upper  bound  for  the  search  range  in  the  horizontal  direction 
in  the  ECLM  space.  Also  the  FOE  for  arbitrary  translation 
of  the  camera  and  the  feature  points  (corners)  arc  assumed  to 
be  known.  Another  motion  stereo  method  using  feature  points 
(corners)  for  computing  depth  in  image  space  can  be  found  in 

15]- 

Recently,  Xu,  Tsuji  and  Asada  [6]  have  suggested  a  coarsc- 
to-fine  iterative  matching  method  for  motion  stereo.  By  sliding 
a  camera  along  a  straight  line,  a  sequence  of  images  is  taken 
at  predetermined  positions.  The  pair  with  the  short  baseline  is 
matched  first  to  produce  a  coarse  disparity  map  based  on  the 
zero-crossings.  Then  the  coarse  disparity  map  is  used  .to  reduce 
search  range  for  the  pair  with  the  next  longer  baseline.  This 
procedure  is  continued  until  the  pair  with  the  largest  baseline  is 
processed.  One  major  advantage  of  this  method  is  that  occlu¬ 
sions  can  be  predicted  from  the  previous  disparity  map  to  avoid 
mismatches  at  present  step.  Although  the  computation  time 
is  less  compared  to  other  coarse-to-fine  methods,  this  method 
gives  only  a  sparse  disparity  map  and  can  not  be  implemented 
on  line. 

Matthies,  Szeliski  and  Kanade  (7]  have  introduced  two  real 
time  approaches,  based  on  intensity  values  and  features  using  a 
Kalman  filtering  technique.  A  sequence  of  lateral  motion  images 
is  generated  by  a  moving  camera  along  a  straight  line  from  left 
to  right  (or  right  to  left).  The  intensity  based  approach  consists 
of  four  stages  for  each  frame.  First,  a  new  measurement  of  dis¬ 
parity  at  each  pixel  is  obtained  by  using  a  correlation  matching 
procedure.  Then  the  estimate  of  disparity  is  updated  using  a 
Kalman  filter  update  equation  based  on  the  new  measurement. 
Third,  a  generalized  piecewise  continuous  spline  technique  is 
used  to  smooth  the  updated  estimate.  Finally,  the  disparity  for 
each  pixel  in  the  next  frame  is  given  by  the  prediction  procedure 
As  reported  in  [7],  the  intensity  based  approach  is  more  efficient 
than  the  feature  based  approach.  But  a  major  problem  in  the 
intensity  based  approach  is  that  once  the  updated  estimate  is 
smoothed  in  the  third  stage,  the  gain  of  the  Kalman  filter  and 
the  error  variance  of  the  estimate  are  no  longer  correct  so  that 
they  can  not  be  used  in  the  next  iteration. 

Attempting  to  achieve  human-like  performance,  many  re¬ 
searchers  have  been  using  neural  networks  to  solve  the  match¬ 
ing  problem  based  on  binocular  images  i.e.  one  pair  of  images 
(8,  9,  10,  11,  12,  13,  14,  15],  In  this  paper,  we  first  present  a 
neural  network  based  batch  algorithm  for  motion  stereo.  The 
conventional  batch  algorithms  implement  matching  procedure 
many  times  requiring  a  lot  of  computations.  For  example,  if 
there  are  M  image  frames,  the  matching  procedure  has  to  be 
implemented  (M  —  1)  times  to  obtain  (M  —  1)  disparity  mea¬ 
surements  for  each  pixel.  To  obtain  a  good  estimate  of  disparity 
value  from  these  measurements,  usually  a  filtering  procedure  is 
required.  Instead  of  doing  matching  (M  —  1)  times,  the  neu¬ 
ral  network  batch  approach  implements  the  matching  algorithm 
only  once  by  simultaneously  using  all  the  images  pairs  so  that 
computational  complexity  is  greatly  reduced.  We  also  present 
a  real  time  algorithm  using  a  neural  network.  Basically,  we  use 
a  recursive  least  squares  (RLS)  algorithm  to  update  the  bias 


inputs  of  the  i  etwork  whenever  the  next  frame  becomes  avail¬ 
able.  After  all  images  arc  received,  matching  is  carried  out  by 
neuron  evaluation,  minimizing  energy  function  of  the  network. 
The  disparity  values  arc  then  given  by  the  neuron  states.  If 
the  intermediate  results  arc  required,  one  can  implement  the 
matching  procedure  for  every  pair  of  images.  Unlike  [7],  this 
method  runs  the  matching  algorithm  only  once  and  needs  no 
interpolation  procedure.  Since  the  neural  network  can  be  run 
in  parallel  and  the  RLS  algorithm  can  be  implemented  on  line, 
the  recursive  algorithm  is  extremely  fast  and  hence  useful  for 
real  time  robot  vision  applications.  As  the  derivatives  of  the  in¬ 
tensity  function  are  more  reliable  than  the  intensity  values  and 
arc  dense,  both  algorithms  use  the  derivatives  as  measurement 
primitives  for  matching.  The  chamfer  distance  information  is 
also  used  for  matching  to  overcome  the  lack  of  information  in 
homogeneous  regions.  Recognizing  that  occlusions  might  affect 
the  matching  accuracy  very  seriously  when  a  long  sequence  of 
image  frames  is  involved,  we  have  designed  an  algorithm  for  lo¬ 
cating  occluding  pixels.  Once  an  occluding  pixel  is  detected,  the 
occlusion  information  is  embedded  into  the  network  by  resetting 
the  bias  inputs  so  that  the  network  automatically  takes  care  of 
occluding  pixels  during  the  updating  procedure. 

2  Dep$h  from  Motion 

2.1  Camera  Configuration 

It  is  assumed  that  a  sequence  of  images  is  taken  by  a  camera 
moving  along  with  a  constant  velocity  from  right  to  left  along 
a  straight  line,  as  shown  in  Figure  1.  Several  assumptions  are 
made  for  simplifying  the  problem.  First,  it  is  assumed  that  the 
optical  axis  of  the  camera  is  perpendicular  to  the  moving  di¬ 
rection  and  the  horizontal  axes  of  the  image  planes  are  parallel 


Figure  1:  Camera  geometry  for  motion  stereo. 


to  the  moving  direction.  The  constraint  imposed  on  the  cam¬ 
era  configuration  is  to  restrict  the  search  within  the  horizontal 
direction  only,  the  so  called  epipolar  constraint.  Secondly,  it 
is  assumed  that  the  camera  takes  pictures  exactly  every  I  sec¬ 
onds  apart.  Thus,  all  images  arc  equally  separated,  i.e.  each 
successive  image  pairs  has  the  same  baseline. 
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Let  OX  Y'Z  be  the  world  coordinate  system  with  Z  axis  di¬ 
recting  along  the  camera  optical  axis  and  oKi,y,  be  the  pth  image 
plane  coordinate  system.  The  origin  of  the  pth  image  system  is 
located  at  (0,  — (p  -  1  )vt,f)  of  the  world  system,  where  v  is  the 
velocity  of  camera,  vl  is  the  distance  between  two  successive  im¬ 
ages  and  /  is  the  focal  length  of  the  lens  which  takes  a  positive 
value  in  the  world  system.  Under  perspective  projection,  a  point 
in  the  world,  ( A'0 ,  Yc,  Za),  projects  into  the  pth  image  plane  at 


,  >  JXo  f(Y0  +  (p-\)vl)x 

=  (~2~  > - 2 - ' 


Theoretically,  the  disparity  D0  can  be  derived  from  two  succes¬ 
sive  image  frames 

D„  =  yp-  yp-\  =  /  d~  ■  (2) 

where  d  =■  v  t  is  the  baseline. 


2.2  Estimation  of  Pixel  Positions 

Let  (i,j)  be  the  position  of  the  (i,y)th  pixel  in  the  first  frame. 
In  the  successive  frames,  due  to  camera  motion,  the  position 
of  all  pixels  are  shifted  to  the  right  by  vt.  Under  the  epipolar 
constraint,  the  shift  happens  only  in  the  horizontal  direction. 
For  example,  the  («,j)th  pixel  moves  from  position  (s', j)  in  the 
first  frame  to  position  (i,j+{r*-)  in  the  second  frame.  Let  StJ(p) 
be  the  total  shift  of  pixel  (i,j)  from  the  first  to  the  pth  frame. 
Thus 

Sij(p)  = 

=  (p-l)d-j  (3) 

where  <f,  j  is  the  true  disparity  value  for  pixel  (i,j).  Note  that 
the  shift  Sij{p)  is  continuous  due  to  the  continuous  variable  dij. 
A  rounding  operation  has  to  be  applied  to  Sij(p)  for  locating 
the  (i,  j)th  pixel  in  the  subsampled  image.  After  rounding,  the 
position  of  the  (t,i)th  pixel  in  the  pth  frame  is  given  by 

(U  +  I^V)-  (4) 

where  [  ]  is  a  rounding  operator.  It  can  be  simply  written  as 

(*,i  +  kW)  (5) 


where 


k  = 


S.,(P ) 

W 


3  Matching  Algorithms 

Two  algorithms,  batch  and  recursive,  are  presented  in  this  sec¬ 
tion. 

3.1  Batch  Algorithm 

A  discrete  neural  network  is  used  for  representing  the  disparity 
field.  1  he  network  consists  of  Nr  x  Nc  x  ( D  +  1 )  mutually  inter¬ 


connected  binary  ncuroi  s,  where  D  is  the  maximum  disparity, 
Nr  and  Nc  arc  the  image  row  and  column  sizes,  respectively. 
Let  V  =  {u,j.*,l  <  t  <  Nr>l  <  ]  <  NC}0  <  k  <  D]  be  a 
binary  state  set  of  the  neural  network  with  t>,,»  (1  for  firing  and 
0  for  resting)  denoting  the  state  of  the  (i,j,k) th  neuron.  For 
each  pixel,  say  (i,j),  we  use  (D  +  1)  mutually  exclusive  neurons 
{vij.o,  u.  y  i, ...,  v.j  o }  to  represent  the  disparity  value.  Theo¬ 
retically,  disparity  takes  continuous  values.  For  implementation 
purposes,  we  sample  the  disparity  range  using  bins  of  size  W . 
Hence,  when  Vjjj,  is  1,  this  means  that  the  disparity  value  is 
JfcfF  at  the  pixel  (t.j). 

The  neural  network  parameters,  the  interconnection  strengths 
Tij,kjpn.n  and  the  bias  inputs  can  be  determined  in  terms 
of  the  energy  function  of  the  network.  The  energy  function  of 
the  neural  network  is  defined  as 

.  Nr  Nr  Nr  Nc  D  D 

E  =  I  IE  5Z  Vijj,  Vlpn,n 

Z  isl  J=1  >=1  m= 1  k=0  n= 0 

Nr  N<  D 

-Z  Z  Z  vo.k  (6) 

1=1  j=i  *= o 

In  order  to  use  the  spontaneous  energy-minimization  process  of 
the  neural  network,  we  reformulate  the  stereo  matching  problem 
under  the  epipolar  assumption  as  one  of  minimizing  an  error 
function  with  constraints.  Suppose  that  M  image  frames  are 
used  for  matching,  £he  error  function  can  be  written  as 

,  Nr  N,  D  M- 1 

E  =  w~iZ  EE  E  \{9P(i,j  +  (P-1)*H0 

m  1  «=i  j=i  k=o  P=  1 

-a'p+ifij  +  pkW))7  +  ic(/p(i ,j  +  (p  -  l)kW) 

+o  5E  EE  EE  EE  (?•&  -  +••*)*  (7) 

4  «-i  i  k= o  «es 

where  {yp(-)}  and  {/,(•)}  denote  the  intensity  derivatives  and 
the  chamfer  distance  values  at  (-)  of  the  pth  frame,  respectively, 

5  is  an  index  set  excluding  (0,0)  for  all  the  neighbors  in  a  T  x  T 
window  centered  at  point  (t,  j),  A  and  k  are  constants.  The  first 
term  called  the  photometric  constraint  in  (7)  seeks  disparity  val¬ 
ues  such  that  all  regions  of  two  images  are  matched  in  a  least 
squares  sense.  Meanwhile,  the  second  term  is  the  smoothness 
constraint  on  the  solution.  The  constant  k  determine  the  rel¬ 
ative  importance  of  the  two  kinds  of  measurement  primitives, 
derivatives  and  distance  values,  and  the  constant  A  determines 
the  tradeoff  between  the  two  terms  to  achieve  the  best  results. 

The  interconnection  strengths  and  bias  inputs  are  deter¬ 
mined  by  comparing  the  terms  in  the  expansion  of  (7)  with  the 
corresponding  terms  in  (6) 


=  —  48A6, j6j  „6t  „  +  2A  ^  (8) 

»cs 


i.j*  =  Z  +  (p  - 1  w 

p= i 

-?p+i (>,J  +  pkW))1  +  k(/p(i,;  +  (r  -  1) k\V) 

-fP^j+pkW)Y\  (9) 
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where  S,  i,  is  the  Dirac  delta  function.  The  size  of  the  smooth¬ 
ing  window  used  in  (8)  is  5.  However,  one  can  choose  either  a 
larger  or  smaller  window.  From  (8)  one  can  see  that  the  inter¬ 
connections  are  symmetric,  the  self-feedback  weight  T,jk,jk  is 
not  zero.  Also  note  that  the  bias  inputs  consists  of  the  measure¬ 
ment  primitives  only  and  the  interconnection  strengths  contains 
no  information  about  the  images. 

Matching  is  carried  out  by  neuron  evaluation.  Once  the  pa¬ 
rameters  Tij  t.'i'"'*  and  /.j  t  are  obtained  using  (8)  and  (9),  each 
neuron  can  synchronously  evaluate  its  state  and  readjust  accord¬ 
ing  to  updating  equations 

Nr  Nc  D 

—  VI  ^  T, J, m.n Ci.m.r,  -f  f,j  Jt  (10) 

Jstl  mrl  n=4 

and 

Vij.k  =  3WiJ.k)  (11) 

where  is  a  maximum  evolution  function 


/_  \  _  X  1  */  Xij,k  — 

‘J’k  |  0  otherwise. 


=  max(xijj;l  =  0, 1.....0). 


Note  that  the  synchronous  updating  scheme  can  be  implemented 
in  parallel.  The  uniqueness  of  matching  problem  is  ensured  by 
a  batch  updating  scheme- D  +  I  neurons  {u.  j.o,  — u.  j.o )  pixel 
(i,j)  are  updated  at  each  step  simultaneously. 

The  initial  state  of  the  neurons  were  set  as 


'  {  0  r 


if  =  max(Iijjr,  1  =  0,1, ....  D). 
otherwise 


where  lij*  is  the  bias  input. 

As  motioned  in  (16,  IS],  the  self-feedback  may  cause  energy 
function  to  increase  with  a  transition.  A  deterministic  decision 
rule  is  used  to  ensure  convergence  of  the  network,  probably  to 
a  local  minimum.  The  batch  matching  algorithm  can  then  be 
summarized  as 

Batch  matching  algorithm: 

1.  Estimate  the  network  inputs. 

2.  Set  the  initial  state  of  the  neurons. 

3.  Update  the  state  of  all  neurons  synchronously  according 
to  the  deterministic  decision  rule. 

4.  Check  the  energy  function;  if  energy  does  not  change  any¬ 
more,  stop;  otherwise,  go  back  to  step  3. 

3.2  Recursive  Algorithm 

This  algorithm  basically  consists  of  two  steps:  bias  input  update 
and  stereo  matching.  Whenever  a  new  frame  of  image  becomes 
available,  the  bias  inputs  of  the  network  are  updated  by  the  RLS 
algorithm. 

Suppose  that  images  are  corrupted  by  additive  white  noise 
and  the  measurement  model  is  given  by 

I.iAp)  =  Hp,gJi,j  +  (p  ~  l)kW),fp(i,j  +  (;i  -  DkW), 


Vn(«.  j  +  ph’W),  /,*,(«',)  +  pkW),  n.j.tfp)) 

=  +  (p-  IH'F)  _  gp„(i,j  +  pkW))1 

-*(£(«.  j  +  (P  -  1  )kW)  -  +pk\V))t 

for  p  =  1,2 . M  -  1 

where  li  is  a  measurement  function  and  n,jt(;i)  is  noise.  For  p 
such  measurements,  find  a  function 

i'J.k(p)  —  Aj.t(p),  iij.kip  ~~  1),  J>(1))  (14) 

that  estimates  the  value  of  the  bias  input  fjj,  in  some  sense. 
The  value  of  the  function  is  the  estimate.  If  the  measurement 
function  is  linear  and  the  measurement  noise  is  white,  then  a 
Kalman  filter  is  commonly  used  for  finding  an  optimal  estimate. 
In  (14),  as  the  measurement  function  is  nonlinear  and  the  mea¬ 
surement  noise  is  no  longer  white  but  is  dependent  on  measure¬ 
ments,  the  linear  Kalman  filter  does  not  yield  a  good  estimate. 
In  contrast  to  the  Kalman  filter,  the  RLS  algorithm  does  not 
make  any  assumption  about  measurement  function  and  noise. 
Hence,  the  RLS  algorithm  can  be  used  to  update  the  bias  in¬ 
puts.  When  the  pth  frame  becomes  available,  the  bias  input  is 
updated  by 

itjAp)  =  fjAp  - 1)  +  -  U.jAp)  ~  hjAP  - 1)).  (15) 

4  p 

This  RLS  algorithm  is  equivalent  to  the  batch  least  squares  al¬ 
gorithm  with  the  initial  condition 

Aua(0)  =  0. 

The  interconnection  strengths  is  the  same  as  in  (8)  and  the 
bias  input  is  given  by  (15).  Since  the  bias  inputs  are  recursively 
updated  and  contain  all  the  information  about  the  previous  im¬ 
ages,  we  do  not  have  to  implement  the  matching  algorithm  for 
every  recursion  if  the  intermediate  result  are  not  required.  This 
method  greatly  reduces  the  computational  load  and  therefore  is 
extremely  fast.  Formally,  the  algorithm  is  as  follows: 

Recursive  matching  algorithm: 

1.  Update  the  bias  inputs  using  the  RLS  algorithm. 

2.  Initialize  the  neuron  states. 

3.  If  there  is  a  new  frame  coming,  go  back  to  step  1;  otherwise 
go  to  step  4. 

4.  Update  the  neuron  states  using  (10)  and  (11)  using  the 
deterministic  decision  rule. 

This  algorithm  has  several  advantages  over  the  correlation  algo¬ 
rithm  of  (7): 

1.  This  algorithm  recursively  updates  the  bias  inputs  instead 
of  the  disparity  values.  The  matching  algorithm  is  imple¬ 
mented  only  once. 

2.  This  algorithm  incorporates  the  smoothness  constraint  into 
the  matching  procedure  instead  of  using  an  extra  smooth¬ 
ing  procedure. 

3.  This  algorithm  uses  the  derivatives  of  the  intensity  func- 
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,  tion,  which  arc  more  reliable  than  the  intensity  values,  as 
measurement  primitives.  Hence  it  is  suitable  for  natural 
images. 


4  Estimation  of  Measurement 
Primitives 

In  this  section  we  present  two  methods  for  estimation  of  the 
derivatives  and  chamfer  distance  values. 

4.1  Estimation  of  Derivatives 

As  only  derivatives  in  the  horizontal  direction  are  required  for 
matching,  the  epipolar  constraint  saves  a  lot  of  computations. 
By  using  a  set  of  univariate  discrete  Chebyshev  polynomials  to 
approximate  the  intensity  function  in  a  window,  we  have 

g(i,j  +  y)  =  A‘  £H(y)  (16) 

where  g(i,j  +  y)  is  the  approximated  continuous  intensity  func¬ 
tion,  i  denotes  the  transpose  operator, 

A‘  =  [no,  Oi,flr.a3.fl4]  (1^) 


is  the  coefficient  vector  and 


CH'fo)  =  (CMy),  Chx(y),  Ch2(y),  Ch3(y ),  Ch,(y)]  (18) 


is  polynomial  vector  defined  over  an  index  set  fl  =  {— cj,  —u  -f 
1, .... u  —  l,w),  i.e.  over  a  window  of  size  2u  +  1,  as  in  (15].  The 
coefficients  {a.}  are  estimated  using 

_  E/eo  Ch~W)  g(».i  +  y0  nq\ 

where  {g(i,j  +  y')}  observed  intensity  values. 

The  first  order  derivatives  of  the  intensity  function  at  sub¬ 
pixel  position  (i,j  +  y)  can  be  calculated  by 

dg(i,j  +  y)  A,  d  , 

~dT ii=i+*  = - dy - =  A  Ty—{y)  (20) 

for  -  0.5  <  y  <  0.5 

For  simplicity  of  notation,  we  use  g  («  j  +  y)  to  represent  the 
first  order  partial  derivatives  of  the  subpixel  intensity  function. 
Using  (17),  (18)  and  (19)  in  (20)  we  get 


(21) 


MW) 


In  (21),  lV(y,  y*)  can  be  considered  as  a  window  operator.  Hence, 
the  derivatives  can  be  obtained  by  convolving  a  window  W( y,y') 
with  the  input  image.  The  window  size  can  be  determined  by 
properly  considering  the  effects  of  image  noise  and  spatial  quan¬ 
tization  error  [16]. 


4.2  Estimation  of  Chamfer  Distance  Values 

To  estimate  the  distance  values,  two  steps  arc  involved:  convert¬ 
ing  the  intensity  image  into  the  binary  image  consisting  of  the 


edge  and  non-edge  pixc.s  and  transforming  the  binary  image  to 
the  chamfer  image. 

Many  conventional  edge  detectors  can  be  used  to  find  edge 
pixels  from  the  intensity  image.  Since  matching  is  restricted 
in  the  horizontal  direction,  only  the  horizontal  chamfer  edge 
information  is  needed  and  therefore  only  the  vertical  edges  have 
to  be  detected.  For  simplicity  of  implementation,  we  use  the 
Prewitt  edge  detector  [17]  with  window  size  of  3  x  3  for  detecting 
the  vertical  edges. 

The  chamfer  distance  values  are  iteratively  determined  using 
the  following  algorithm 

fi'J  =  mm(/£-_V  +  2,  /?£?  +  2)  (22) 

where  fjj  is  the  distance  value  at  the  (i,.;)th  pixel  and  the  1 
denotes  the  iteration  number.  Initially,  the  distance  values  are 
set  to  zero  for  edge  pixels  and  nonzero  (say,  1000)  for  non-edge 
pixels.  The  edge  pixels  obviously  get  the  value  zero.  This  algo¬ 
rithm  is  completely  parallel  and  the  iteration  number  is  equal 
to  the  longest  distance  value  occurring  in  the  image. 

5  Detection  of  Occlusions 

Detection  of  occluding  pixels  is  an  important  issue  in  motion 
stereo.  In  this  section  we  first  discuss  the  nature  of  occlusion 
and  then  derive  \  mean  matching  error.  Finally,  a  detection 
algorithm  is  given^for  locating  occluding  pixels  based  on  the 
mean  matching  error. 

5.1  Occlusions 

As  shown  in  Figure  2(a),  when  a  camera  moves  from  right  to  left, 
points  2,  3,  4  and  5  project  into  the  first  image  plane  at  2’,  3’,  4’ 
and  5’.  But  points  2  and  3  on  the  object  surface  will  not  project 
into  the  second  image  plane  because  they  are  oeduded  by  the 
front  surface  on  which  point  1  lies.  Similarly,  points  2,  3,  4  and 
5  will  not  appear  in  the  image  plane  3.  At  the  location  of  pixels 
2’,  3’,  4’  and  5’,  the  match  error  is  usually  large  which  means  no 
conjugate  pixels  can  be  found  in  the  successive  image  frames. 
Hence,  the  disparity  values  are  undetermined.  Such  pixels  are 
called  occluding  pixels.  When  a  smoothness  constraint  is  used, 
although  the  matching  algorithm  always  assign  some  values  to 
the  occluding  pixels,  the  discontinuities  of  the  disparity  field 
may  be  shifted.  As  the  number  of  frames  increases,  the  number 
of  occluding  pixels  will  dramatically  increase  too.  For  instance, 
if  only  two  object  points  are  occluded  for  the  second  image  as 
shown  in  Figure  2(a),  then  about  ten  points  are  occluded  for 
the  sixth  image  which  gives  a  ten  pixel  wide  occluding  region 
in  the  first  image  plane.  On  the  other  hand,  if  only  the  first 
two  frames  are  used  for  matching,  then  pixels  4’  and  5’  are 
not  occluding  pixels  and  therefore  the  disparity  values  at  such 
location  are  determinable.  As  the  third  frame  does  not  provide 
any  information  about  pixels  4’  and  5’,  there  is  no  need  to  update 
the  bias  inputs  at  the  location  of  these  pixels. 

However,  in  some  cases  the  number  of  occluding  pixels  docs 
not  increase  as  the  number  of  frames  increases.  One  typical 
example  is  illustrated  in  Figure  2(b),  where  pixels  4'  and  5’  arc 
not  occluding  pixels  when  the  third  image  frame  is  used. 
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(a)  Pixels  2\  3‘  .  *'  and  S'  are  occluding  pixels. 


Figure  2:  Occluding  pixels. 

5.2  Matching  Error 

When  multiple  frames  are  used  for  matching,  the  spatial  quan¬ 
tization  error  usually  causes  a  matching  error.  In  this  section, 
we  derive  a  mean  matching  error  which  can  be  used  to  detect 
occluding  pixels.  It  is  assumed  that  the  true  disparity  at  pixel 
(i,  j)  in  a  smooth  region  can  be  expressed  as 

diJ  =  kW  +  6iJ 

where  6ij  is  uniformly  distributed  in  (—7,7)1  and  the  first 
order  derivative  of  the  intensity  function  at  point  (i,j  +  (p  — 
l)JfcW)  of  the  pth  image  can  be  expanded  as  a  Taylor  series 
about  the  point  (i,j  +  (p  —  l)(fcW  +  S,-j))  as 

+  (p-  1  )kW)  =  g'(i,j  +  (p  -  l)(kW  +  bj)  - 

+  (P~  +  ^j) 

+0(61,). 

for  p  =  2,3,...,  Af  (23) 

where  g”(  )  denotes  the  second  order  derivative.  The  best  esti¬ 
mate  of  the  disparity  value  is  given  by 

=  kW. 

The  matching  error  can  be  approximately  written  as 
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^  -77 — r  E  +  (P~  !)(***  +  S'j)) 

M  ~  1  p=l 

+  P(kW  +  6,j)) 

~(p  -  i)6.jgP(i,j  +  (p  -  l)(kW  +  6, Ij)) 

+P  6ijg"r+x(iJ  +  p(kW  +  tf.  j))]1  (24) 

When  images  are  corrupted  by  an  additive  white  noise  variable 
with  variance  <r*,  the  mean  error  at  (»,/)  becomes  (18] 

E{£rror(i,j)t)  =  C.o’  +  <*(*(«,»)*  (25) 

where  Ct  and  C2  are  determined  by 
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and  {  j  is  a  rounding  operator.  If  the  variance  of  noise  and  the 
second  order  derivatives  of  the  intensity  function  are  known  or 
estimated  from  the  images,  then  the  mean  error  corresponding 
to  the  disparity  value  k  at  every  point  for  a  given  w  (window 
size),  M  (frame  number)  and  W  (width  of  subsample  interval) 
can  be  calculated. 

5.3  Detection  of  Occluding  Pixels 

The  intuitive  analysis  given  in  Section  5.1  essentially  suggests 
a  method  for  detecting  occluding  pixels.  By  using  the  mean 
matching  error  derived  above  the  following  detection  rule  can 
be  used  for  detecting  occluding  pixels  and  hence  we  can  prevent 
the  RLS  algorithm  from  updating  the  bias  input  at  the  location 
of  occluding  pixels. 

Detection  rule:  An  occluding  pixel  at  location  (t,j)  is  de¬ 
tected  if 

rm’n(Aj.k|«=oi  0  <  k  <  D)  > 
mar(E{error(i,y)fc};  0  <  k  <  D)  +  b  (26) 

where  6  >  0  is  a  constant  for  raising  the  threshold.  When  the 
noise  variance  is  unknown,  one  can  use  a  constant  threshold 
instead  of  the  mean  error. 

For  the  recursive  algorithm,  once  an  occluding  pixel  is  de¬ 
tected,  the  bias  inputs  of  neurons  at  such  locations  will  not  be 
updated  anymore.  But  during  the  first  iteration,  the  bias  inputs 
of  neurons  at  the  locations  of  occluding  pixels  arc  first  updated 
and  then  corrected  accordingly.  The  correction  procedure  is  as 
follows.  From  Figure  2  it  can  be  seen  that  the  pixels  on  the  left 


11-256 


side  of  the  occluding  region  have  high  disparity  values  and  the 
pixch»on  the  right  side  have  low  disparity  values.  The  width  of 
the  occluding  region,  i.c.  the  number  of  the  occluding  pixels  is 
approximately  given  by 

Ay  =  (27) 

where  ( ]  is  a  rounding  operator,  (i,y)  denotes  the  location  of  the 
most  left  occluding  pixel,  Ay  is  the  true  width,  Ay  is  a  estimate 
of  the  width,  and  Dy_t  and  0y+AiJ  are  the  disparity  values 
of  the  nearest  left  and  right  nonoccluding  pixels,  respectively. 
Since  a  smoothness  constraint  is  used,  the  occluding  pixel  usu¬ 
ally  takes  either  the  high  disparity  value  Z)y_t  or  low  disparity 
value  Di  ■  The  discontinuities  of  the  disparity  field  can  be 
detected  by  checking  the  disparity  values  in  the  yi  direction  if 
there  is  a  transition  from  the  high  value  to  the  low  value.  Start¬ 
ing  with  the  discontinuity  pixel,  we  check  all  the  left  and  right 

neighbor  pixels.  The  search  procedure  will  not  be  stopped  until 
a  A y  wide  or  less  occluding  region  including  the  discontinuity 
pixel  is  found.  Then,  for  all  occluding  pixels  the  bias  input  of 
the  Ayth  neuron  is  corrected  by 

A-UJyay  (0  =  m«"(W  1);  *  =  0, 1 . D).  (28) 

F6r  the  batch  algorithm,  the  bias  inputs  at  occluding  pixels  are 
estimated  using  only  the  first  two  image  frames. 


6  Experimental  Results 

We  have  tested  both  the  batch  and  recursive  algorithms  on  two 
sequences  of  natural  images  taken  by  a  camera  moving  right  to 
left.  Due  the  space  limitations,  we  give  only  one  experimental 
result  here.  The  images  are  of  size  256  x  233.  We  arbitrarily 
chose  five  successive  frames  for  testing,  although  there  is  no  limit 
to  the  number  of  frames  that  can  be  used.  Figure  3  (a)  shows 
the  first  frame.  No  alignment  in  the  vertical  direction  was  made 
and  the  maximum  disparity,  about  2  pixels,  was  measured  by 
hand.  Same  parameters  were  chosen  for  both  algorithms.  The 
subpixel  width  W  was  set  at  0.2  and  hence  D  =  10.  The  param¬ 
eter  A  and  «c  were  set  at  20  and  5,  respectively.  The  threshold 
for  occluding  pixel  detection  was  set  at  150  because  the  noise 
variance  is  unknown.  Figure  3  (b)  shows  the  batch  result  after 
45  iterations.  The  disparity  map  is  represented  as  an  intensity 
image  with  the  brightest  value  denoting  the  maximum  disparity 
value.  The  recursive  result  is  shown  in  Figure  3  (c).  The  itera¬ 
tions  for  the  recursive  solution  are  20.  Since  at  occluding  regions 
we  still  use  the  measurement  primitives  extracted  from  first  two 
image  frames  for  matching,  the  algorithms  might  generate  some 
isolated  points  or  regions  (at  most  {W D]  pixels  wide)  due  to  the 
incorrect  information  caused  by  the  occlusions.  To  remove  such 
points  and  regions,  a  median  filter  is  used  in  our  experiments. 

7  Discussion 


We  have  presented  two  neural  network  based  algorithms,  known 


(a)  The  Trees  image. 


(b)  Batch  algorithm. 


(c)  Recursive  algorithm. 


Figure  3:  Disparity  maps. 
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as  the  batch  and  recursive  algorithms,  for  motion  stereo  using 
thft  first  order  derivatives  of  intensity  function  as  measurement 
primitives.  For  the  recursive  algorithm,  the  bias  inputs  of  the 
neurons  arc  recursively  updated  and  if  the  intermediate  results 
arc  not  required  the  matching  procedure  is  implemented  only 
once.  Unlike  existing  recursive  algorithms,  the  disparity  field 
obtained  by  our  algorithm  is  smooth  and  dense.  Also  no  batch 
results  arc  needed  for  setting  the  initial  states  of  the  neurons. 
Both  batch  and  recursive  methods  gave  very  good  results  in 
comparison  to  Barnard’s  approach  [19].  Experimental  results 
show  that  the  recursive  algorithm  needs  fewer  iterations  than 
the  batch  algorithm.  This  is  because  the  recursive  algorithm 
uses  a  better  bias  input  updating  scheme,  especially  for  the  oc¬ 
cluding  pixels.  The  good  estimate  of  the  bias  inputs  makes  the 
network  converge  fast,  although  the  updating  step  for  bias  in¬ 
puts  takes  more  computations.  In  view  of  parallelism  and  fast 
convergence,  the  recursive  algorithm  is  useful  for  real  time  imple¬ 
mentation,  such  as  in  a  robot  vision  system.  In  our  experiment, 
the  threshold  used  was  150  which  seems  a  little  bit  conservative. 
However,  the  maximum  disparity  is  only  about  2  pixels  which 
means  that  the  width  of  the  occluding  region  is  less  than  2  pixels 
for  two  frames  and  there  are  only  a  few  occluding  pixels  along 
the  right  boundaries  of  the  trees.  Hence  the  occluding  pixels  do 
not  cause  &  serious  problem  in  this  experiment.  This  is  also  why 
the  iteration  number  does  not  reduce  a  lot.  We  believe  that  if 
the  maximum  disparity  is  large  and  a  long  sequence  of  images 
is  used,  then  the  improvement  on  the  occluding  pixel  detection 
will  greatly  reduce  the  number  of  iterations. 
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Abstract 

A  locally  connected  artificial  neural  network  based  on  physiological  and  anatomical  findings  in  the  visual 
system  is  presented  for  motion  perception.  A  set  of  velocity  selective  binary  neurons  is  used  for  each  point 
in  the  image.  Motion  perception  is  carried  out  by  neuron  evaluation  using  a  parallel  updating  scheme. 
Two  algorithms,  batch  and  recursive,  based  on  this  network  are  presented  for  computing  flow  field  from  a 
sequence  of  monocular  images.  The  batch  algorithm  integrates  information  from  all  images  simultaneously 
by  embedding  them  into  the  bias  inputs  of  the  network,  while  the  recursive  algorithm  uses  a  recursive  least 
squares  method  to  update  the  bias  inputs  of  the  network.  Detection  rules  are  also  used  to  find  the  occluding 
elements.  Based  on  information  on  the  detected  occluding  elements,  the  network  automatically  locates 
motion  discontinuities.  The  algorithms  need  to  compute  flow  field  at  mos^  twice.  Hence,  less  computations 
are  needed  and  the  recursive  algorithm  is  amenable  for  real  time  applications. 


1  Introduction 

Recently,  we  have  developed  an  artificial  neural  network  for  motion  perception  based  on  physiological  and 
anatomical  findings  in  the  visual  system  [1,  2].  The  network  is  discrete,  parallel,  deterministic  and  locally 
connected.  A  set  of  velocity  selective  binary  neurons  is  used  for  each  point  in  the  image.  We  assume  that 
each  neuron  receives  inputs  from  itself  and  other  neighboring  neurons  with  similar  directional  selectivity. 
Motion  perception  is  carried  out  by  neuron  evaluation  using  a  parallel  updating  scheme. 

The  network  has  two  important  features.  First,  it  can  accurately  locate  motion  discontinuities.  Usually, 
a  smoothness  constraint  is  used  for  obtaining  a  flow  field.  Using  a  smoothness  constraint  may  blur  surface 
boundaries  and  hence  motion  discontinuities  may  not  be  detected.  Since  motion  discontinuities  contain  rich 
information  about  the  surface  boundaries  and  the  spatial  arrangement  of  the  objects,  attempts  have  been 
made  to  detect  them  by  using  a  line  process  [3].  However,  without  exactly  knowing  the  occluding  elements, 
the  discontinuities  detected  by  the  line  process  may  be  shifted.  We  first  detect  the  occluding  elements  from 
initial  motion  measurements  and  embed  them  in  the  bias  inputs,  then  let  the  network  automatically  locate 
the  discontinuities.  For  the  purpose  of  real  time  implementation,  both  neurons  and  lines  are  updated  in  a 
parallel  fashion. 

Second,  our  network  can  use  multiple  image  frames  to  compute  the  flow  field.  Natural  images  are  often 
degraded  by  the  imaging  system.  Based  on  such  imperfect  observations,  it  is  difficult  to  compute  the  flow  field 
accurately,  especially  near  motion  depth  discontinuities.  To  improve  the  accuracy  of  the  solution,  multiple 
frames  are  used.  Two  algorithms,  batch  and  recursive,  are  presented.  The  batch  algorithm  simultaneously 
integrates  information  from  all  images  by  embedding  them  into  the  bias  inputs  of  the  network,  while  the 
recursive  algorithm  uses  a  recursive  least  squares  (RLS)  method  to  update  the  bias  inputs  of  the  network. 
Both  these  methods  need  to  compute  flow  field  at  most  twice.  Hence,  less  computations  are  needed  and  the 
recursive  algorithm  is  amenable  for  real  time  applications. 

'This  research  work  is  partially  supported  by  the  AFOSR  Contract  No.  F-'UX>20-87-C-0007  and  the  AFOSR  Grant  No. 
8G-01D6. 
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2  An  Artificial  Neural  Network 

2.1  Physiological  Considerations 

Microelectrode  studies  in  cats  and  monkeys  indicate  that  the  visual  cortex,  a  few  millimeter  thick  neuronal 
tissue,  is  organized  in  a  topographic,  laminar  and  columnar  fashion  [4,  5].  The  image  on  the  retina  is  first 
projected  to  the  lateral  geniculate  bodies  and  then  from  there  to  the  visual  cortex  in  a  strict  topographical 
manner.  The  neurons  in  the  visual  cortex  are  arranged  in  layers  and  grouped  according  to  several  stimulus 
parameters  such  as  eye  dominance,  receptive  field  orientation  and  receptive  field  position.  The  groupings 
take  the  form  of  vertically  arranged  parallel  slabs  spanning  the  full  cortical  thickness.  The  optical  nerve 
fibres  arriving  from  the  lateral  geniculate  bodies  mostly  terminate  in  layer  4  of  visual  area  17,  yielding  a 
cortical  representation  of  retina.  From  area  17  the  visual  signals  pass  to  adjacent  area  18  and  other  higher 
visual  areas  such  as  middle  temporal  (MT),  each  with  a  complete  topographic  map  of  the  visual  field  [6,  7,  8]. 

Figure  1  is  an  idealized  and  speculative  scheme  for  the  hypercolumns  of  the  visual  area  17  [5],  Neurons 
with  similarly  orientation-  and  direction-selectivities  are  stacked  in  discrete  columns  which  are  perpendicular 
to  the  cortical  surface.  For  simplicity,  blobs  and  ocular  dominance  columns  are  not  included.  All  the  neurons 
such  as  simple,  complex  and  hypercomplex  within  a  column  have  the  same  receptive  field  axis  orientation. 
For  instance,  if  an  electrode  is  placed  into  the  cortex  in  a  direction  perpendicular  to  the  cortex  surface,  all 
the  neurons  encountered  shows  the  same  axis  orientation.  If  the  electrode  goes  in  a  direction  parallel  to  the 
cortex  surface,  there  occurs  a  regular  shift  in  the  axis  orientation,  about  5  —  10°  for  every  advance  of  25  —  50 
fim.  Over  a  distance  of  about  1  mm,  there  is  roughly  a  full  rotation  (180°).  A  set  of  orientation  columns 
representing  a  full  rotation  of  180°  together  with  an  intersecting  pair  of'bcular  dominance  columns  forms  a 
hypercolumn.  Each  hypercolumn  an  elementary  unit  of  the  visual  cortex  is  responsible  for  a  certain  small 
area  of  the  visual  field  and  encode  a  complete  feature  description  of  the  area  by  the  activity  of  neurons. 
Advancing  more  than  1  mm  produces  a  displacement  in  the  visual  field,  out  of  the  area  where  one  started  and 
into  an  entirely  new  area.  The  simple  neuron  is  orientation-selective  and  the  complex  neuron  is  direction- 
selective.  The  simple  neuron  responds  best  to  a  stationary  line  which  is  oriented  with  the  axis  of  the  receptive 
field.  For  the  complex  neurons,  not  only  the  orientation  of  the  line  but  also  the  stimulus  speed  and  motion 
direction  are  important.  A  oriented  line  produces  strong  responses  to  a  complex  neuron  if  it  moves  at  an 
optimal  speed  in  a  direction  perpendicular  to  the  receptive  field  axis  orientation  within  the  receptive  field. 
About  half  of  the  complex  neurons  responds  only  to  one  direction  of  movement.  If  the  speed  is  less  or  greater 
than  the  optimum,  the  neuron’s  firing  frequency  tends  to  fall  off  sharply.  The  optimal  speed  varies  from 
neuron  to  neuron.  For  instance,  in  cats  it  varies  from  about  0.1°/sec  up  to  about  20° /sec  [4].  Hence,  the 
complex  neurons  are  direction-  and  speed-selective,  i.e.,  velocity-selective  and  area  17  plays  a  crucial  role  in 
determining  velocity  selection  [9,  10J.  We  assume  that  the  complex  neurons  within  a  column  can  be  further 
grouped  according  to  their  speed  selectivity.  Figure  2  shows  a  possible  2-D  grouping  pattern  of  the  complex 
neurons  within  a  hypercolumn.  Each  circle  represents  one  or  more  neurons  since  several  neurons  may  have 
the  same  velocity  selectivity.  The  coordinates  of  the  circle  indicate  the  velocity  selectivity  of  the  neurons. 

MT  is  also  a  “motion  area”.  Neurons  in  MT  ar"  predominantly  direction-selective,  about  90%  show 
some  direction  selectivity  and  80%  are  highly  selective,  and  are  arranged  in  columns  according  to  direction 
selectivity  [11,  8,  12).  Area  17  projects  to  area  MT  in  a  very  unique  way:  (1)  the  projection  only  happens 
between  the  columns  with  similar  directionality;  (2)  neurons  projecting  from  a  given  location  in  area  17 
diverge  to  several  periodically  spaced  locations  in  MT,  and  several  locations  in  area  17  converge  upon 
a  given  location  in  MT.  These  properties  probably  play  a  very  important  rule  in  maintaining  axis  and 
direction  selectivity  and  forcing  the  neighboring  receptive  fields  to  have  the  same  directional  preference. 
Figure  3  shows  such  a  projection  pattern. 

2.2  Computational  Considerations 

Usually,  images  are  uniformly  sampled  by  the  image  digitizer  and  computing  flow  field  is  to  find  the  conjugate 
points  in  images  and  compute  their  displacements.  For  implementation  purposes,  we  assume  that  the 
neurons  in  a  hypercolumn  are  uniformly  distributed  over  a  2-D  Cartesian  plane.  Then  the  conjugate  point 
can  be  found  by  checking  every  image  pixel  within  a  neighborhood  in  the  successive  frame  based  on  the 
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measurement  primitives.  The  maximum  search  range,  i.e.,  the  maximum  displacement  can  be  determined 
by  the  maximum  optimal  speed.  To  improve  the  accuracy  of  the  solution,  the  velocity  component  ranges 
can  be  further  sampled  using  bins  of  size  IV,  where  W  is  a  real  number. 

We  assume  that  each  hypercolumn  represents  a  single  image  pixel  or  subpixcl  (if  the  image  is  subsamplcd). 
If  the  maximum  displacement  is  D,  then  about  (2D  +  l)2  mutually  exclusive  neurons  are  needed  for  each 
pixel  and  a  total  number  of  Nr  x  Nc  x  (2D  +  I)2  neurons  are  required  for  a  Nr  x  Nc  image.  Since  two 
objects  cannot  occupy  the  same  place  at  the  same  time,  only  one  velocity  value  can  be  assigned  to  each  pixel. 
Therefore,  in  each  hypcrcolumn,  only  one  neuron  is  in  active  state.  The  velocity  value  can  be  determined 
according  to  its  direction  selectivity.  Figure  4  shows  such  a  network  with  small  frames  for  the  hypercolumns 
and  circles  for  the  neurons.  In  fact,  each  small  frame  contains  many  neurons.  For  simplicity,  only  a  few 
neurons  are  present  in  each  frame.  Each  neuron  receives  a  bias  input  from  outside  world.  The  bias  input  may 
consist  of  several  different  types  of  measurement  primitives,  such  as  the  raw  image  data,  filtered  image  data 
including  their  derivatives,  edges,  lines,  corners  etc.  As  the  neighboring  receptive  fields  are  forced  to  have 
the  same  directional  preference,  we  assume  that  neurons  with  similar  velocity  selectivity  in  the  neighboring 
hypercolumns  tend  to  afTect  each  other  through  receiving  inputs  from  each  other  as  shown  in  Figure  4.  This 
feature  implies  the  smoothness  constraint  which  can  be  seen  more  clearly  if  the  network  is  organized  in 
a  multi-layer  fashion.  Figure  5  shows  a  multi-layer  network  which  is  equivalent  to  the  original  one.  The 
network  consists  of  (2D*  +  1)  x  (2 Di  +  1)  layers.  Eaih  layer  corresponds  to  a  different  velocity  and  contains 
Nr  x  Ne  neurons.  Each  neuron  receives  excitatory  and  inhibitory  inputs  from  itself  and  other  neurons  in  a 
neighborhood  in  the  same  layer.  For  each  point,  only  the  neuron  that  has  the  maximum  excitation  among 
all  neurons  in  the  other  layers  is  on  and  the  others  are  off.  When  the  neurqp  at  the  point  (i,j)  in  the  kth 
and  fth  layers  is  1,  this  means  that  the  velocities  in  k  and  l  directions  at  the  point  (t,jf)  are  k  W  and  /  W, 
respectively. 

Formally,  the  multi-layer  network  can  be  described  as  follows.  Let  V  =  {v,jtk,l,  1  <  t  <  Nr ,  1  <  j  < 
Ne,  —Dk  <  k  <  Dt,  —Di  <  l  <  Di }  be  a  binary  state  set  of  the  neural  network  with  Vij  kj  denoting  the 
state  of  the  (i,j,k,l) th  neuron  which  is  located  at  point  (i,j)  in  the  (i,/)th  layer,  Tijk,i,m,n,k,i  the  synaptic 
interconnection  strength  from  neuron  ( i,j,k,l )  to  neuron  ( m,n,k,l )  and  Iij,k,i  the  bias  input. 

At  each  step,  each  neuron  ( i,j,k,l )  synchronously  receives  inputs  from  itself  and  neighboring  neurons 
and  a  bias  input 

=  ^  '  Tij'k,l;m,n,k,lvm,n,k,l  "F  I ij,k,l  (1) 

(m-«,n-7')€S0 

where  So  is  an  index  set  for  all  neighbors  in  a  T  x  T  window  centered  at  point  (i,  j).  The  potential  of  the 
neuron,  Vij.kj,  is  then  fed  back  to  corresponding  neurons  after  maximum  evolution 


» 'ij.k.t  =  9(Vi,j,k,l) 


(2) 


where  g(xi,j,k,t)  is  a  maximum  evolution  function  (it  is  also  called  winner-take-all  function) 

(  1  if  Xii'k.i  =  mai(  Xijp  ~Dk  <P<Dk,  - D ,  <q<  Dt). 

o  otherwise . 


(3) 


The  neuron  evaluation  will  be  terminated  if  the  network  converges,  i.e.,  the  energy  function  of  the  network 
defined  by 


E  = 


/Vc  Uk  D, 

EE  Ei  £  rf'i,j,k,ttm,n,k,t  Vm,n,k,t  +  lij.t.l  w ij.k.t )■ 

;  =  1  k  =  -Dt  l--Di  )g  S0 


(4) 


reaches  a  minimum. 


3  Computing  Flow  Field 

A  smoothness  constraint  is  used  for  obtaining  a  smooth  optical  flow  field  and  a  line  process  is  employed  for 
detecting  motion  discontinuities.  The  line  process  consists  of  vertical  and  horizontal  lines,  and  Lh .  Each 


II  -  877 


line  can  be  in  either  one  of  two  states:  1  for  acting  and  0  for  resting.  The  error  function  for  computing  the 
flow  field  can  be  properly  expressed  as 

N,  Nc  Dk  D, 

E  =  EE  E  £  {[A  (*„(»,  j)  -  *2,(«  +  k,j  +  0)2  +  A  (kl2(i,j)  -  k22(i  +  kj  +  I))2 

i  =  1  >  =  1  i  =  -0*  /=-£>, 

+(si(i,j)  ~  M*  +  k<j  +  ~  u(«d)+*.t.')2 

•  €5 

+  (C'/2)[(t'<J,*,l  -  Wi+Ij,*,l)2(l  -  t'ij.k.l)  +  -  ®«J  +  I.*./)2(1  -  (5) 

where  ktt(*,j)  and  kl2(i  +  k,j+l)  are  the  principle  curvatures  of  the  first  image,  Jt2i  (*,»  and  *22(*  +  *.J+0 
are  the  principle  curvatures  of  the  second  image,  j)}  and  {<72(*  +  k,j  +  /)}  are  the  intensity  values  of 

the  first  and  second  images,  respectively,  S  =  So  —  (0,0)  is  an  index  set  excluding  (0,0),  A,  B  and  C  are 
constants.  The  principle  curvatures  can  be  estimated  by  using  a  polynomial  fitting  technique  [1]. 

The  first  term  in  (5)  is  to  seek  velocity  values  such  that  all  points  of  two  images  are  matched  as  closely 
as  possible  in  a  least  squares  sense.  The  second  term  weighted  by  B  is  the  smoothness  constraint  on  the 
solution  and  the  third  term  weighted  by  C  is  a  line  process  to  weaken  the  smoothness  constraint  and  to 
detect  motion  discontinuities.  The  constant  A  in  the  first  term  determines  the  relative  importance  of  the 
intensity  values  and  their  principle  curvatures  to  achieve  the  best  results.  Note  that  the  error  function  does 
not  contain  a  line  process  penalty  term.  Suppose  we  add  a  penalty  term  D(L*  t  /  +  £,?  •  t  I)  to  (5)  and  define 
the  potential  of  the  line,  say  the  vertical  line  as 

Q 

Vi'i.k.l  =  2’(*'i J.k.l  ~  t'i+lj.t,/)2  +  D-  (6) 

By  thresholding  the  potential  <pij,kj  at  2ero>  the  new  state  of  the  line  takes  1  or  0  accordingly.  When 
C  >  2D,  if  two  connected  neighboring  points  have  different  velocity  value,  then  the  penalty  term  can  not 
suppress  the  line  process  and  a  line  will  be  turned  on.  When  C  <  2D,  no  line  will  be  turned  on  even  if  there 
exists  a  difference  between  the  two  points,  which  means  no  penalty  is  needed.  Hence,  adding  the  penalty 
term  or  not  makes  no  difference.  Basically,  the  line  process  does  nothing  but  changing  the  smoothing  weights. 
For  instance,  if  ail  lines  are  on,  the  weights  are  all  same  i.e.,  If  all  lines  are  off,  then  the  weights  at  the 
four  nearest  neighbors  of  the  center  point  are  increased  by  The  line  process  weakens  the  smoothness 
constraints  by  changing  the  smoothing  weights,  resulting  in  space  variant  smoothing  weights. 

By  choosing  the  interconnection  strengths  and  bias  inputs  as 

TiJ'k,t;m,n,k,l  =  —  [48B  +  C(4  —  L^j  k  ,  -  Ljj_ ,  t  ,  — 

+C((1  —  +  +  (1  ~  ^ij-l1l1l)^i,m^j-l,n  +  (1  —  k'iJ1k,l)^i  +  i,m^j,n 

+(1  —  ^(»d),(m,n)+«  (?) 

»es 


h.j.kj  =  -A[(ku(iJ)~kjt(i  +  k,j+/))2  +  (kI2(i,j)  -  k22(i  +  k,  j  + 1))2]  -  (gi(i,j)  -  g2{i  +  k,  j  +  /))2  (8) 


where  6a,t  is  the  Dirac  delta  function,  the  error  function  (5)  is  transformed  into  the  energy  function  (3)  of 
the  neural  network.  Note  that  the  interconnection  strengths  consist  of  constants  and  line  process  only.  The 
bias  inputs  contain  all  information  from  the  images.  This  is  why  we  like  to  call  Iij,k,i  bias  input  instead  of 
threshold,  because  the  neurons  receive  the  information  from  outside  through  the  bias  inputs.  Equation  (8) 
also  gives  us  a  hint  to  develop  multiple  frame  algorithms.  The  size  of  the  smoothing  window  used  in  (7)  is 
5x5. 

Computation  of  flow  field  is  carried  out  by  neuron  evaluation.  As  the  first  and  second  terms  in  (5)  do 
not  contain  line  process,  we  can  then  update  line  process  prior  to  the  updating  of  the  neurons.  Let  L"’"™ 

and  t  denote  the  new  and  old  states  of  the  vertical  line  L"y-  k  ,,  respectively.  Let  <fii,j,k,i  be  the  potential 
of  vertical  line  L*  j  k  ,  given  by 

C  2 

1 Pi, j.k.l  =  -  «\  +  l .j.k.l)  (9) 
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Then,  the  new  state  is  determined  by 


r  v  ,ncw 


1  if  •Pi.j.k.i  >  0 
0  otherwise. 


(10) 


It  is  easy  to  see  that  whenever  the  states  of  neurons  Vi,j,k,i  and  Wi+i j,k,i  arc  different,  the  vertical  line  L" j  k  , 
will  be  active  provided  that  the  parameter  C  is  greater  than  zero.  If  C  =  0,  then  all  lines  have  zero  potential 
and  will  be  inactive.  Since  the  line  process  weakens  the  smoothness  constraint,  the  choice  of  C  is  closely 
related  to  selecting  the  smoothness  parameter  B  in  (5).  A  similar  updating  scheme  is  also  used  for  the 
horizontal  lines. 

For  each  neuron,  we  use  (1)  and  (2)  to  synchronously  evaluate  its  state  and  readjust  accordingly.  The 
initial  state  of  the  neurons  is  set  as 


„  _  /  1  if  I  =  maz(Iij'Pi1-,-Dt  <  P  <1  Dk,-D,  <  q  <  D/). 

*.j.k.i  q  otherwise  ' 

where  is  the  bias  input.  The  initial  conditions  are  completely  determined  by  the  bias  inputs,  the 

information  from  outside  world.  If  there  are  two  maximal  bias  inputs  at  point  (*,j),  then  only  the  neuron 
corresponding  to  the  smallest  velocity  is  initially  set  at  l  and  the  other  one  is  set  at  0.  This  is  consistent 
with  the  minimal  mapping  theory  [13].  In  the  updating  scheme,  we  also  use  the  minimal  mapping  theory 
to  handle  the  case  of  two  neurons  having  the  same  largest  inputs.  When  the  network  reaches  a  stable  state, 
the  flow  field  is  determined  by  the  neuron  states. 


4 

4  Detection  of  Motion  Discontinuities 


Motion  discontinuities  in  flow  field  often  result  from  occluding  contours  of  moving  objects.  In  this  case, 
the  moving  objects  are  projected  into  image  plane  as  adjacent  surfaces  and  their  boundaries  are  undergoing 
either  split  or  fusion  motion  [13].  These  motion  situations  give  rise  to  discontinuities  along  their  bound¬ 
aries.  To  overcome  such  difficulties  and  to  locate  the  discontinuities  more  accurately,  we  first  detect  the 
occluding  elements  based  on  the  initial  motion  measurements.  Then  all  the  information  about  the  occluding 
elements  can  be  embedded  into  the  bias  inputs  such  that  the  network  can  automatically  take  care  of  motion 
discontinuities  during  updating  procedure. 

Suppose  that  the  surfaces  are  translating  with  constant  velocities.  Let  us  consider  the  case  in  which 
a  surface  is  moving  against  a  stationary  background  as  shown  in  Figure  6.  Let  Xk  denote  the  occluding 
element,  A 2  and  X2  the  corresponding  elements  of  Ai  and  X\,  respectively.  Let  (»,  j)  be  the  coordinates  of 
element  At,  d,  and  d )  the  i  and  j  components  of  flow  at  (i,j),  respectively.  We  assume  that  Aj  and  Vj  are 
located  at  (i  +  dj,  j  +  dj)  and  (i  +  2  x  dj,  j  +  2  x  dj)  respectively.  By  defining  the  match  errors 

ei(*\j)  =  -Uj.di.dj,  e2(i,  j)  =  -Ii+d,,j+dit  0,0, 

e3  (*,  j)  =  -li+d.j+dj.di.dj,  Cd(ijj)  =  -h+2xd,  J+2xdj,0,0 

where  are  bias  inputs  given  in  (8),  the  following  relations  hold  under  orthographic  or  perspective 

projection  for  the  case  when  there  is  no  motion  along  the  optical  axis, 

ei (i,j)  <  e2(x,  j)  and  &i <  e3(i,j).  (12) 

Note  that  if  the  above  relations  do  not  hold  then  the  element  Xi  is  not  an  occluding  element.  Hence,  it  is 
natural  to  use  the  relations  (12)  for  detecting  the  occluding  elements. 

Detection  rule:  An  occluding  element  is  detected  at  ( i  +  di,j  +  dj)  if  the  flow  has  nonzero  values  at  (i,  j), 

e2(*-j)  -  e,(*,i)  >  T  and  e^(i.j)  -  e3(i,j)  >  T  (13) 

where  the  theshold  T  is  a  nonnegative  number  and  ck(i,j)  are  the  average  values  of  the  matching  errors 
within  a  Tt  x  Tr  window  St 

et((i,j)  +  s)  for  k  =  1,2,3,  and  A. 
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For  digitized  natural  images,  T  usually  takes  nonzero  value  to  reduce  the  effects  of  quantization  error 
and  noise.  Using  a  large  value  for  T  can  eliminate  false  occluding  elements  but  may  miss  the  true  ones. 
Since  flow  at  an  occluding  point  has  zero  values,  th'c  a  priori  knowledge  about  the  occluding  elements  can 
be  embedded  in  the  bias  inputs  by  setting,  (for  instance  at  point  (i  +  d;,j  +  dj)) 

I>+d„i+4,,o,o  =  -Die  <  k  <  Dk,  —  Dt  <l<  Di).  (14) 

Accordingly,  the  neural  network  will  prefer  zero  flow  at  these  points  and  therefore  the  line  process  can 
precisely  locate  motion  discontinuities. 


5  Multiple  Frame  Approaches 

When  image  quality  is  poor,  measurement  primitives  estimated  from  these  images  are  not  accurate  and 
reliable.  For  instance,  if  the  images  are  blurred  by  motion,  then  the  local  features  are  smeared  and  much  of 
the  information  is  lost.  Especially,  the  object  boundaries  become  wide  and  the  derivatives  of  the  intensity 
function  become  small  at  the  boundaries.  Based  on  these  low  quality  measurements,  motion  discontinuities 
can  not  be  correctly  located  and  hence  flow  field  can  not  be  accurately  computed.  To  improve  the  accuracy, 
one  way  is  to  improve  the  image  quality  by  using  some  image  restoration  techniques  to  remove  degradations. 
However,  without  a  priori  knowledge  of  the  degradations,  such  as  blur  function,  an  image  can  not  be  restored 
perfectly.  When  the  blur  is  ill  conditioned,  it  is  still  difficult  to  restore  the  image  even  if  the  blur  function 
is  given.  An  alternative  is  to  compute  flow  field  over  a  long  time  interval,  i.e.  using  multiple  frames.  In  this 
section,  two  algorithms,  batch  and  recursive,  using  more  than  two  frames  of  images  are  presented. 


5.1  Batch  Algorithm 

Assume  that  the  objects  in  the  scene  are  moving  with  a  constant  velocity  translational  motion.  It  is 
interesting  to  note  that  in  the  two  frame  case  the  bias  inputs  (8)  contain  nothing  but  the  measurement 
primitives,  the  intensity  values  and  principle  curvatures,  which  are  estimated  from  images.  The  bias  inputs 
of  the  network  are  completely  determined  by  the  observations,  i.e.  the  images,  while  the  interconnection 
strengths  (7)  do  not  contain  any  observations.  All  these  facts  suggest  that  any  information  from  outside 
world  can  be  included  in  the  bias  inputs.  The  network  learns  all  information  directly  from  the  inputs.  Hence, 
it  is  natural  to  extend  the  two  frame  approach  to  multiple  frames  by  adding  more  observations  to  the  bias 
inputs.  Assuming  M  frames  of  images  are  available,  the  bias  inputs  are  given  by 

M~  1 

=  -  Yl  {A((fcrl(*  +  rk  -  k,j  +  rl  -  l)  -  t(r+1)l(i  +  rk,j  -f  rl))2  +  (fcr2(i  +  rk  -  k,j  +  rl  -  l) 

r=l 

-*(r+!)2(*  +  rk,j  +  rl))2)  +  (ffr(i  +  rk-  k,j  +  rl  -  l)  -  gr+i{i  +  rk,j  +  rl))2}.  (15) 

Accordingly,  the  initial  state  of  the  neurons  (11)  is  set  by  using  these  new  bias  inputs.  The  two  frame 
algorithm  presented  in  section  2.2  can  be  used  without  any  modifications  for  the  multiple  frame  case. 


5.2  Recursive  Algorithm 

If  all  the  images  are  not  available  at  the  same  time  or  if  one  wants  to  compute  the  flow  field  in  real  time,  a 
recursive  algorithm  can  be  used.  The  recursive  algorithm  uses  an  RLS  algorithm  to  update  the  bias  inputs. 
First,  the  initial  condition  for  the  bias  inputs  is  set  to  zero,  i.e.  Iij,k,i{ 0)  =  0.  This  is  reasonable,  because 
there  is  no  information  available  at  the  beginning.  Then,  whenever  a  new  frame  becomes  available,  the  bias 
inputs  can  be  updated  by 


/.j,t,f(r)  =  h,j,kAr  ~  0  +  ~(iij.kAr)  ~  li.i.kAr  ~  0) 


for  2  <  r  <  M 


(16) 
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where  /ijf*tj(r)  is  a  new  observation  given  by 

h,j,k,i(r)  =  -A[(kn (*  +  rk  -  k,j  +  rl  —  l)  —  fc(r+1),(*  +  rk,  j  +  r/))2  +  (fcr2(i  +  rk  -  k,j  +  rl  -  l) 

-*(r+i)2(‘  +  rk,  j  +  rl))2]  +  ( gr(i  +  rk  -  k,j  +  rl  -  l)  -  gr+i(«  +  rk,j  -f  rl))2 .  (17) 

In  fact,  this  RLS  algorithm  is  equivalent  to  the  bate!)  algorithm.  If  the  intermediate  results  are  not  required, 
flow  field  can  be  computed  after  all  images  are  received.  As  one  can  see,  the  RLS  algorithm  is  parallel  in 
nature  and  very  few  computations  are  required  at  each  step.  Hence  this  algorithm  is  extremely  fast  and  can 
be  implemented  in  real  time. 

Since  the  number  of  occluding  elements  dramatically  increases  as  more  image  frames  are  involved,  special 
attention  has  to  be  paid  to  this  problem.  With  minor  modifications,  the  detection  criterion  used  for  the  two 
frames  can  be  extended  to  multiple  frames  [1]. 

The  multiple  frame  based  algorithms  can  be  summarized  as 

1.  Compute  flow  field  from  the  first  two  frames. 

2.  Update  bias  inputs. 

3.  Detect  occluding  elements  and  reset  bias  inputs  accordingly.  For  the  recursive  algorithm,  go  back 
to  step  2  if  the  incoming  frame  is  not  the  last  one;  otherwise,  go  to  step  4. 

4.  Compute  flow  field  with  updated  bias  inputs. 


. .  6  Results 

*  ■ 

Figure  7  shows  the  first  and  fourth  frames  of  a  sequence  of  images,  a  pick-up  truck  moving  from  right  to  left 
against  a  stationary  background.  The  images  are  of  size  480  X  480.  As  the  rear  part  of  the  truck  is  missing 
in  the  first  frame,  we  reversed  the  order  of  image  sequence  so  that  there  is  a  complete  truck  image  in  the 
first  frame.  Accordingly,  the  direction  of  the  computed  flow  field  should  be  reversed.  For  the  two  frame 
approach,  we  used  the  fourth  frame  as  the  first  frame  and  the  third  frame  as  the  second  frame.  For  reducing 
computations,  the  image  size  was  reduced  to  120  X  120  by  subsampling. 

For  each  point,  we  use  two  memories  in  the  range  —Dt  to  Dt  and  —Dt  to  Dt  to  represent  velocities  in  i 
and  j  directions,  respectively,  instead  of  using  (2D*  +  1)  and  (2 Dt  +  1)  neurons.  Due  to  local  connectivity, 
the  potential  of  the  neuron  is  computed  only  within  a  small  window.  By  setting  A  =  2,  B  =  250,  C  =  50, 
Dt  =  7,  and  =  1,  the  flow  field  was  obtained  after  36  iterations.  A  48  x  1 13  sample  of  the  computed  flow 
field  corresponding  to  the  part  framed  by  black  lines  in  Figure  7(b)  is  given  in  Figure  8.  Since  the  shutter 
speed  was  low,  the  truck  was  heavily  blurred  by  the  motion.  The  motion  blur  smeared  the  edges  and  erased 
local  features,  especially  the  features  on  the  wheel.  Hence,  it  is  difficult  to  detect  the  rotation  of  the  wheels. 
Also  note  that  although  most  of  the  boundary  locations  are  correct,  the  boundaries  due  to  the  fusion  motion 
such  as  the  rear  part  of  the  truck  and  the  driver’s  cab  are  shifted  by  the  line  process. 

The  occluding  pixels  were  detected  at  T  =  100  based  on  the  initially  computed  flow  field  of  Figure  8. 
By  embedding  the  information  about  the  occluding  pixels  into  the  bias  inputs,  using  the  initially  computed 
optical  flow  as  the  initial  conditions  and  choosing  A  =  2,  jB  =  188,  C  =  200,  Dt  =  7  and  D\  =  1  the  final 
result  shown  in  Figure  9  was  obtained  after  13  iterations.  The  accuracy  of  boundary  location  is  significantly 
improved. 

For  the  multiple  frame  approaches,  we  used  four  image  frames.  Theoretically  there  is  no  limit  to  the 
number  of  frames  that  can  be  used  in  the  batch  approach.  For  the  same  reason  mentioned  before,  the 
fourth  frame  was  taken  as  the  first  frame,  the  third  frame  as  the  second  frame,  etc.  Since  the  batch  and 
recursive  algorithms  are  similar  in  spirit,  a  set  of  identical  parameters  was  used  for  both  algorithms  in  this 
experiment,  and  the  same  results  were  obtained.  The  occluding  pixels  were  also  detected  at  T  =  100  from 
four  frames.  Figure  10  shows  the  flow  field  computed  from  four  frames  using  the  occluding  pixel  information. 
The  parameters  used  were  A  =  4,  B  =  850,  C  =  80,  Dt  =  7  and  Dt  =  1,  and  12  iterations  were  required. 
As  expected,  the  output  is  much  cleaner  and  the  boundaries  are  more  accurate  than  that  of  the  two  frame 
based  approach.  The  number  of  iterations  is  also  reduced. 
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Figure  1:  An  idealized  and  speculative  scheme  for 
the  hypercolumns  of  visual  area  17.  The  blocks  de¬ 
noted  by  thick  lines  represent  hypercolumns  contain¬ 
ing  complete  sets  of  orientation  columns.  The  thin 
lines  separate  individual  orientation  columns.  L  and 
R  denote  the  areas  corresponding  to  left  and  right 
eyes. 


Figure  4:  An  artificial  neuron  network.  Small  frame 
denotes  the  hypercolumn.  The  neurons  in  a  hyper¬ 
column  are  uniformly  distributed  on  a  plane. 


Figure  5:  A  multi-layer  network  which  is  equivalent 
to  the  original  one.  The  neurons  are  arranged  in 
layers  according  to  their  velocity  selectivity,  i  and 
j  denote  the  image  coordinates.  Jb  and  l  denote  the 
velocity  coordinates. 


Moving  object 


Figure  2:  A  2-D  grouping  pattern  for  the  neurons 
within  a  hypercolumn.  Neurons  are  arranged  ac¬ 
cording  to  their  direction  and  speed  selectivity. 
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Figure  3:  Projection  pattern  from  area  17  to  MT. 
For  simplicity,  only  the  cross-section  of  the  four  dif¬ 
ferent  orientation  selective  hypercolumns  are  shown 
for  each  area. 


(a)  One  moving  object. 


- d - ►J^ —  d - 

(b)  Detection  scheme. 

Figure  6:  Detection  of  the  occluding  element. 
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(a)  (b) 

Figure  7:  A  sequence  of  pick-up  truck  images,  (a)  The  first  frame,  (b)  The  fourth  frame. 


Figure  8:  Flow  field  computed  trom  two  frames. 
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Abstract 

A  method  for  matching  stereo  images  using  a  neural  net¬ 
work  is  presented.  Usually,  the  measurement  primitives  used 
for  stereo  matching  are  the  intensity  values,  edges  and  linear 
features.  Conventional  methods  based  on  such  primitives  suf¬ 
fer  from  amplitude  bias,  edge  sparsity  and  noise  distortion. 
We  first  fit  a  polynomial  to  find  a  smooth  continuous  inten¬ 
sity  function  in  a  window  and  estimate  the  first  order  inten¬ 
sity  derivatives.  A  neural  network  is  then  employed  to  imple¬ 
ment  the  matching  procedure  under  the  epipolar,  photometric 
and  smoothness  constraints  based  on  the  estimated  first  order 
derivatives.  Owing  to  the  dense  intensity  derivatives  a  dense 
array  of  disparities  are  generated  with  only  a  few  iterations. 
This  method  does  not  require  surface  interpolation.  Computer 
simulations  to  demonstrate  the  efficacy  of  our  method  are  pre¬ 
sented. 

1  Introduction 

Stereo  matching  is  a  primary  me  \ns  for  recovering  3-D  depth 
from  two  images  taken  from  different  viewpoints.  The  two 
central  problems  in  stereo  matching  are  to  match  the  corre¬ 
sponding  points  and  to  obtain  a  depth  map  or  disparity  val¬ 
ues  between  these  points.  In  this  paper  we  present  a  method 
for  computing  the  disparities  between  the  corresponding  points 
in  two  images  recorded  simultaneously  from  a  pair  of  laterally 
displaced  cameras  based  on  the  first  order  intensity  derivatives. 
An  implementation  using  a  neural  network  is  also  given. 

Basically,  there  exist  two  types  of  stereo  matching  methods: 
region  based  and  feature  based  methods  according  to  the  na¬ 
ture  of  the  measured  primitives.  The  region  based  methods  use 
the  intensity  values  as  the  measurement  primitives.  A  correla¬ 
tion  technique  or  some  simple  modification  is  applied  to  certain 
local  region  around  the  pixel  to  evaluate  the  quality  of  match¬ 
ing.  The  region  based  methods  usually  suffer  from  the  problems 
due  to  lack  of  local  structures  in  homogeneous  regions,  ampli¬ 
tude  bias  between  the  images  and  noi6e  distortion.  Recently, 
Barnard  [l]  applied  a  stochastic  optimization  approach  for  the 
stereo  matching  problem  to  overcome  the  difficulties  due  to 
homogeneous  regions  and  noise  distortion.  Although  this  ap- 
proch  is  different  from  the  conventional  region  based  methods, 
it  still  uses  intensity  values  as  the  primitives  with  the  aid  of 
a  smoothness  constraint.  Barnard’s  approach  has  several  ad¬ 
vantages:  simple,  suitable  for  parallel  processing  and  a  dense 
disparity  map  output.  However,  too  many  iteations,  a  com¬ 
mon  problem  with  the  simulated  annealing  algorithm,  makes 
it  unattractive.  It  also  suffers  from  the  problem  of  amplitude 
bias  between  the  two  images. 

The  feature  based  methods  use  intensity  edges  or  linear  fea- 
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tures  (for  example,  see  Grimson  [2]  and  Mcdioni  [3])  or  inten¬ 
sity  peaks  which  correspond  to  discontinuities  in  the  first  order 
derivatives  of  intensity  [4].  The  intensity  edges  are  obtained 
using  edge  detectors  such  as  the  Marr-Ilildreth  edge  detector 
[5]  or  the  Nevatia-Babu  line  finder' [6].  Since  amplitude  bias 
and  small  amount  noise  do  not  affect  edge  detection,  feature 
based  methods  can  handle  natural  images  more  efficiently.  Ow¬ 
ing  to  fewer  measurement  primitives  to  deal  with,  usually  fea¬ 
ture  based  methods  are  faster  than  the  region  based  methods. 
However,  a  surface  interpolation  step  has  to  be  included.  In 
order  to  obtain  smooth  surface  interpolations,  several  types  of 
smoothing  constraint  techniques  have  been  introduced  [2].  The 
common  problem  in  feature  based  methods  is  that  if  features 
are  sparse,  then  s deface  interpolation  step  is  usually  difficult. 

Julesz’s  example  of  random  dot  stereograms  shows  that 
stereo  matching  occurs  very  early  in  the  visual  process  [7],  The 
question  is  what  kind  of  measurement  primitives  human  stereo 
process  does  use.  As  we  know  that  the  amplitude  bias  can  be 
eliminated  by  differential  operation,  the  intensity  derivatives 
are  dense,  and  human  visual  system  is  sensitive  to  the  intensity 
changes,  the  first  order  intensity  derivatives  (simplest  deriva¬ 
tives)  may  be  considered  as  appropriate  measurement  primi¬ 
tives  for  the  stereo  matching  problem.  Noise  distortion,  which 
the  first  order  derivatives  are  very  sensitive  to,  can  be  reduced 
by  some  smoothing  techniques  such  as  polynomial  fitting  tech¬ 
nique.  The  first  order  intensity  derivatives  can  be  obtained  by 
directly  taking  the  derivative  about  the  resulting  continuous 
intensity  function. 

Recently,  many  researchers  have  been  using  neural  network 
for  stereo  matching  based  on  either  intensity  values  or  edges 
[8,9,10].  Early  work  on  extracting  the  depth  information  from 
the  random  dot  stereogram  using  neural  network  can  be  found 
in  [11].  In  this  paper,  we  use  a  neural  network  with  maxi¬ 
mum  evolution  function  to  solve  the  stereo  matching  problem 
based  on  the  first  order  intensity  derivatives  under  the  epipolar, 
photometric  and  smoothness  constraints.  We  illustrate  the  use¬ 
fulness  of  this  approach  by  using  the  random  dot  stereograms. 
Due  to  lack  of  space,  natural  image  examples  are  not  given  in 
this  paper. 

2  Estimation  of  the  First  Order  Inten¬ 
sity  Derivatives 

Natural  digital  images  usually  are  corrupted  by  certain  amount 
of  niose  due  to  electronic  imaging  sensor,  film  granularity  and 
quantization  error.  The  derivatives  obtained  using  a  difference 
operator  applied  to  digital  images  are  not  reliable.  Since  digital 
image  comes  about  by  sampling  an  analog  image  on  an  equally 
spaced  lattice,  a  proper  way  to  recover  a  smooth  and  contin¬ 
uous  image  surface  is  by  a  polynomial  fitting  technique.  We 
first  assume  that,  a  point  at  the  right  image  corresponding  to 
a  specified  point  in  the  left  image  lies  somewhere  on  the  corre- 
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sponding  epipolar  line  which  is  parallel  to  the  row  coordinate, 
i.e.  in  a  horizontal  direction,  and  second,  in  each  neighborhood 
of  image  the  underlying  intensity  function  can  be  sufficiently 
approximated  by  a  4th  order  polynomial.  The  first  assumption 
is  also  known  as  the  epipolar  constraint.  With  the  help  of  this 
constraint,  the  first  order  derivatives  we  need  for  matching  are 
computed  only  for  the  horizontal  direction.  Under  the  second 
assumption,  the  intensity  funtion  in  a  window,  centered  at  the 
point  («',/),  of  size  2r  +  1  is  fitted  by  a  polynomial  of  the  form 

g(i,j  +  z)  =  a,  +  ajz  +  a3x2  +  a4x3  +  asx 4  (1) 


where  i  is  lies  in  the  range  — r  to  +r  and  {a;}  are  coeflecients. 
The  first  order  intensity  derivative  at  point  ( i,j )  can  easily  be 
obtained  by  taking  the  derivative  about  g(i,j  +  x)  with  respect 
to  x  and  then  setting  x  =  0 


~df~  ~  — TX — U=°  =  (2) 

In  order  to  estimate  each  coeffecient  independently,  an  or¬ 
thogonal  polynomial  basis  set  is  used.  Several  existing  orthog¬ 
onal  polynomial  basis  sets  can  be  found  in  [12,13].  We  use  the 
discrete  Chebyshev  polynomial  basis  set,  also  used  by  Haralick 
for  edge  detection  and  topographic  classification  [14,15].  The 
important  property  of  using  polynomials  is  that  low  order  fits 
over  a  large  window  can  reduce  the  effects  of  noise  and  give  a 
smooth  function. 

Let  a  set  of  discrete  Chebyshev  polynomials  be  defined  over 
an  index  set  R  =  {— r,  — r  +  1, ...,  r  -  1,  r),  i.e.  over  a  window 
of  size  2r  +  1,  as 
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With  the  window  centered  at  point  («,  j),  the  intensity  func¬ 
tion  g(i,j  +  *)  for  each  i  €  R  can  be  obtained  as 


+  *)  =  £  Chm{x)  (4) 

m=0 

where  g{i,j  +  x)  denotes  the  approximated  continuous  intensity 
funtion.  By  minimizing  the  least  square  error  in  estimation  and 
taking  advantage  of  the  orthogonality  of  the  polynomial  set,  the 
coefficients  { dm }  are  obtained  as 


(5) 


d  =  C7»m(l)  ?(«,■?  +  *) 

m  Chi( u) 

where  {g(i,j  +  x)}  are  the  observed  data  values. 

Extending  (4)  and  comparing  with  terms  in  (1),  the  first 
order  intensity  derivative  coefficient  qj,  is  given  by 


=  di  -  —d3 
92 


=  £-M(i)9(iJ  +  x) 

where  M(x)  is  determined  by 

Ch i(x)  Ch3(x) 


Af(x) = 


(6) 


(7) 


From  (6)  one  can  see  that  M(x)  is  a  filter  for  detecting  intensity 
changes. 


3  A  Neural  Network  for  Matching 

We  use  a  neural  network  containing  binary  neurons  for  repre¬ 
senting  the  disparity  values  between  the  two  images.  The  model 
consists  of  Nr  x  Nc  X  D  mutually  interconnected  neurons,  where 
D  is  the  maximum  disparity,  Nr  and  Nc  are  the  image  row  and 
column  sizes,  respectively.  Let  V  ~  1  <  t  <  N„  1  <  j  < 

Nc,  0  <  k  <  D }  be  a  binary  state  set  of  the  neural  network 
with  Vijj,  (1  for  firing  and  0  for  resting)  denoting  the  state  of 
the  (i,j,k) th  neuron.  Especially,  when  the  neuron  u.  j,*  is  1, 
this  means  that  the  disparity  value  is  k  at  the  point  (i,  j).  Ev¬ 
ery  point  is  represented  by  D  +  1  mutually  exclusive  neurons, 
i.e.  only  one  neuron  is  firing  and  others  are  resting,  due  to  the 
uniqueness  constraint  of  the  matching  problem.  Let  Tij:kj,m,n 
denote  the  strength  (possibly  negative)  of  the  interconnection 
between  neuron  ( i,j,k )  and  neuron  (l,m,  n).  We  require  sym¬ 
metry 

'f'ij.k-.l.m.n  =  ^'l,m,T\,xj,k 

Jot  1  <  »,/  <  N,  1  <  j,tn  <  Nc  and  0  <  k,n  <  D 

We  also  insist  that  the  neurons  have  self-feedback,  i.e. 

/  0.  In  this  model,  each  neuron  (i,j,k)  randomly  and  asyn¬ 
chronously  receives  inputs  from  all  neu¬ 

rons  and  a  bias  input  /,j  t 

Nr  Nc  D 

UxJ,k  =  )  ^  y!  )  1  +  f*J,k  (8) 

/=1  m=l  n=0 

Each  Ufjj,  is  fed  back  to  corresponding  neurons  after  maxin  nm 
evolution 

(9) 

where  g(xijj, )  is  a  nonlinear  maximum  evolution  function  whose 
form  is  taken  as 


if  Zij,k  =  rnaxfajj;  1  =  0,1, ....  D). 
otherwise. 


(10) 


In  this  model,  the  state  of  each  neuron  is  updated  by  using 
the  latest  information  about  other 'neurons.  The  uniqueness  of 
matching  problem  is  ensured  by  a  batch  updating  scheme-Z)+l 
neurons  are  updated  at  each  step  simultaneously. 


3.1  Estimation  of  Model  Parameters 

The  neural  model  parameters,  the  interconnection  strengths 
and  the  bias  inputs,  can  be  determined  in  terms  of  the  energy 
function  of  the  neural  network.  As  defined  in  [16],  the  energy 
function  of  the  neural  network  can  be  written  as 

,  Nr  Nr  Nr  Nc  D  D 

E  —  —  2  53  £  £  £  £  £  Tij.k; l.m.n  vi \j,k  V|,m,n 

1=1  1=1  j=t  m-l  k=0  n=0 
Nr  Nr  D 

“E  E  E  01) 

4=1  i= 1  fc= 0 

In  order  to  use  the  spontaneous  energy-minimization  process  of 
the  neural  network,  we  reformulate  the  stereo  matching  prob¬ 
lem  under  the  epipolar  assumption  as  one  of  minimizing  an 
error  function  with  constraints  defined  as 

E  =  £  £  £  (9t(*. j) -  ?«(*.>  ©  *))J  vvj.s 
»=i  j  *=o 
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,  Nr  Nc  D 

+2  £  it  n  e  (i2> 

*  ;=i  >=t  t=o  »6S' 

where  {fljr,(-)}  and  {ffn('))  are  *hc  first  order  intensity  deriva¬ 
tives  of  the  left  and  right  images,  respectively,  S  is  an  index  set 
for  ail  neighbors  in  a  5  X  5  window  centered  at  point  ( i,j ),  A  is 
a  constant  and  the  symbol  ©  denotes  that 


if  0  <  a  +  6  <  Nc,  Nr 
otherwise 


The  first  term  called  the  photometric  constriant  in  (12)  is  to 
seek  disparity  values  such  that  all  regions  of  two  images  are 
matched  as  close  as  possible  in  a  least  squares  sense.  Mean¬ 
while,  the  second  term  is  the  smoothness  constraint  on  the  so¬ 
lution.  The  constant  A  determines  the  relative  importance  of 
the  two  terms  to  achieve  the  best  results. 

By  comparing  the  terms  in  the  expansion  of  (12)  with  the 
corresponding  terms  in  (11),  we  can  determine  the  interconnec¬ 
tion  strengths  and  bias  inputs  as 

Tij,fc;(,m,n  =  ~48A6i'i6j'm6/ctn  4-  2A  )  '  (13) 

and 

Iij,k  =  -(s'd'yj)  ~  9n{i,i  ©  *))2  (14) 

.  where  6a,h  is  the  Dirac  delta  function. 


3.2  Stereo  Matching 

Stereo  matching  is  carried  out  by  neuron  evaluation.  Once 
the  parameters  j,m,m  and  are  obtained  using  (13)  and 
(14),  each  neuron  can  randomly  and  asynchronously  evaluate 
its  state  and  readjust  accordingly  using  (8)  and  (9). 

The  initial  state  of  the  neurons  were  set  as: 


(1  if  =  = 

(  0  otherwise 


(15) 


where  I<jj,  is  the  bias  input. 

However,  this  neural  network  has  self-feedback,  i.e.  T{j,k^j,k 
j£  0,  as  a  result  of  a  transition  the  energy  function  E  does  not 
decrease  monotonically.  This  is  explained  as  follows.  Since  we 
are  using  a  batch  updating  scheme,  (D  +  1)  nerons  {u.-j,*;  k  = 
0,  ...,£>}  corresponding  to  the  image  point  (i,j)  are  simulta¬ 
neously  updated  at  each  step.  However,  at  most  two  of  the 
(£>  +  1)  neurons  change  their  state  at  each  step.  Define  the 
state  changes  A Vjj,k  and  of  neurons  (i,  j,  k)  and  (i,j,  k  ) 

and  energy  change  AjE  as 


AV‘J*  ~  ViJ,k  ~  Vij,k 


and 


a  „  _  otd 

Avij,k'  =  vij,k'  ~  vij,k' 

A  E  =  Enrw  -  Em 


Consider  the  energy  function 


.  Nr  Nr  Nc  Nc  D  D 

E  =  I]  £  £  S  Tij'kJ.m.n  °ij,k  V(,m,n 

^  i=l  /=1  j=l  m=l  k=0  n=0 

Nr  Ne  D 

-Y1Y1JZ  (16) 

«=t  j-i  fc=o 


the  change  A E  due  to  a  changes  Av;j(jt  and  A v-jy  given  by 


Nr  Nc  D 

=  ;f.m,n  v/,m,n  +  )  &vij,k 

1  =  1  m~l  «=0 
Nr  Nc  D 

( ^  1 *J,k*  )  ^VtJ,k 


(=1  m  =  l  n  ~0 


2  TiJ,k,iJ,k  (A u,,y,*)2  T; Jk'jj.k'  (Avij,k' )2 


+ A«.o,k<rk) 


(17) 


is  not  always  negative.  A  simple  proof  for  this  may  be  found 
in  (17J.  Therefore,  the  convergence  of  the  network  is  not  guar¬ 
anteed  [18], 

To  ensure  convergence  of  the  network,  we  have  designed  a 
deterministic  decision  rule.  The  rule  is  to  take  a  new  state 
and  oij  k‘  of  neurons  ( i,j,k )  and  ( i,j,k ')  if  the  energy  change 
A E  due  to  state  changes  Au.j,*  and  A »f-  .  t<  is  less  than  zero. 
If  A E  due  to  state  change  is  >  0,  no  state  change  is  affected. 
A  stochastic  decision  rule  can  also  be  used  to  obtain  a  globally 
optimal  solution  [19]. 

The  stereo  matching  algorithm  can  then  be  summarized  as 


1.  Set  the  initial  state  of  the  neurons. 


2.  Update  the  state  of  all  neurons  asynchronously  and  ran¬ 
domly  according  to  the  decision  rule. 

3.  Check  the  energy  function;  if  energy  does  not  change  any¬ 
more,  stop;  otherwise,  go  back  to  step  2. 


4  Experimental  Results 


A  variety  of  images  including  random  dot  stereograms  and  nat¬ 
ural  stereo  image  pairs  were  tested  using  our  algorithm.  The 
random  dot  stereograms  were  created  by  the  pseudo  random 
number  generating  method  described  in  [20].  Each  dot  consists 
of  only  one  element.  All  the  following  random  dot  stereograms 
are  in  the  form  of  three  level  “wedding  cake”.  The  background 
plane  has  zero  disparity  and  each  successive  layer  plane  has 
additional  two  elements  of  disparity.  In  order  to  implement 
this  algorithm  more  efficiently  on  a  conventional  computer,  we 
make  the  following  simplifications.  Since  only  one  of  D  +  1 
neurons  is  firing  at  each  point,  we  used  one  neuron  lying  in  the 
range  0  to  D  to  represent  the  disparity  value  instead  of  D  +  1 
neurons.  From  (13)  one  can  see  that  the  interconnections  be¬ 
tween  the  neurons  are  local  (  a  5  X  5  neighborhood)  and  have 
the  same  structure  for  all  neurons.  Therefore,  we  used  a  5  x  5 
window  for  computing  Uijj,  and  energy  function  E  instead  of 
a  NrNc(D  +  l)xNrNc(D  +  1)  interconnection  strength  matrix. 
The  simplified  algorithm  greatly  reduces  the  space  complexity 
by  increasing  the  program  complexity  little.  Therefore,  it  is' 
very  fast  and  efficient. 

Figure  1  shows  a  10%  random  dot  stereogramlntensity  val¬ 
ues  of  the  white  and  black  elements  are  255  and  0,  respectively. 
Figure  1(c)  is  the  resulting  disparity  map  after  14  iterations. 
The  disparity  values  are  encoded  as  intensity  values  with  the 
brightest  value  denoting  the  maximum  disparity  value.  We  used 
A  =  20,  D  =  6  and  r  =  2  (i.e.  window  size  was  5).  Note  that 
the  disparity  map  is  dense. 

A  similar  test  was  run  on  the  decorrelated  stereogram  [7]. 
The  original  stereogram  is  50%  density  random  dots.  In  the 
left  image,  20%  of  the  dots  were  decorrelated  at  random.  By 
setting  A  =  2800,  D  =  6  and  r  =  2,  a  dense  disparity  map  in 
Figure  2(c)  was  obtained  after  12  iterations. 

Owing  to  page  limitations,  natural  image  examples  are  not 
given  in  this  paper. 
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(c)  Disparity  map  represented  by  intensity  image. 
Figure  1:  A  10%  density  random  dot  stereogram. 


(c)  Disparity  map  represented  by  intensity  image. 

Figure  2:  A  50%  density  random  dot  stereogram.  In  the  left 
image,  20%  of  the  dots  were  dccorrclatcd  at  random. 
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The  design  of  an  optical  computer  must  be  based  on 
the  characteristics  of  optics  and  optical  technology,  and 
not  on  those  of  electronic  technology.  In  this  paper  the 
property  of  optical  superposition  is  considered  and  the 
implications  it  has  in  the  design  of  computing  systems  is 
discussed.  It  can  be  exploited  in  the  implementation  of 
optical  gates,  interconnections,  and  shared  memory. 

INTRODUCTION 

Fundamental  differences  in  the  properties  of  elec¬ 
trons  and  photons  provide  for  expected  differences  in 
computational  systems  based  on  these  elements.  Some, 
such  as  the  relative  ease  with  which  optics  can  imple¬ 
ment  regular,  massively  parallel  interconnections  are  well 
known.  In  this  paper  we  examine  how  the  property  of 
superposition  of  optical  signals  in  a  linear  medium  can 
be  exploited  in  building  an  optical  or  hybrid 
optical/electronic  computer.  This  property  enables 
many  optical  signals  to  pass  through  the  same  point  in 
space  at  the  same  time  without  causing  mutual  interfer¬ 
ence  or  crosstalk.  Since  electrons  do  not  have  this  pro¬ 
perty,  this  may  shed  more  light  on  the  role  that  optics 
could  play  in  computing.  We  will  separately  consider 
the  use  of  this  property  in  interconnections,  gates,  and 
memory. 

INTERCONNECTIONS 
A  technique  for  implementing  optical  interconnec¬ 
tions  from  one  2-D  array  to  another  (or  within  the  same 
array)  has  been  described  (Jenkins  et  al,  1984].  It  util¬ 
izes  two  holograms  in  succession  (Fig.  I).  The  holograms 


Fig.  1.  Optical  holographic  system  for  interconnections. 


idea  is  to  define  a  finite  number,  M ,  of  distinct  intercon¬ 
nection  patterns,  and  then  assemble  the  interconnecting 
network  using  only  these  M  patterns.  The  second  holo¬ 
gram  of  Fig.  1  consists  of  an  array  of  facets,  one  for  each 
of  the  M  interconnection  patterns.  The  first  hologram 
contains  one  facet  for  each  input  node,  and  serves  to 
address  the  appropriate  patterns  in  the  second  hologram. 

It  is  the  superposition  property  that  makes  this 
interesting.  Note  that  many  different  igrai  beams  can 
pass  through  the  same  facet  of  the  second  hologram  at 
the  same  time  without  causing  mutual  interference.  (All 
of  these  signals  merely  get  shifted  in  the  same  direction 
and  by  the  same  amount.)  This  feature  decreases  the 
complexity  of  both  holograms  —  The  first  because  it  only 
has  to  address  M  facets,  the  second  hologram  because  it 
only  has  M  facets.  Let  N  be  the  number  of  nodes  in 
the  input  and  output  arrays.  The  complexity  (number 
of  resolvable  spots)  of  each  hologram  can  be  shown  to  be 
proportional  to  NM ,  with  the  proportionality  constant 
being  approximately  25  [Jenkins  et  al.,  1984]. 

Using  this  as  a  model  for  interconnections  in  parallel 
computing,  a  comparison  can  be  made  between  the  com¬ 
plexity  of  these  optical  interconnections  with  those  of 
electronic  VLSI  for  various  interconnection  networks. 
Results  of  this  have  been  given  in  [Giles  and  Jenkins, 
1986],  It  is  found  that  in  general  the  optical  intercon¬ 
nections  have  an  equal  or  lower  space  complexity  than 
electronic  interconnections,  with  the  difference  becoming 
more  pronounced  as  the  connectivity  increases. 

SHARED  MEMORY 

The  same  superposition  principle  can  be  applied  to 
memory  cells,  where  many  optical  beams  can  read  the 
same  memory  location  simultaneously.  This  concept 
could  be  useful  in  building  a  parallel  shared  memory 
machine. 

For  this  concept,  we  first  consider  abstract  models 
of  parallel  computation  based  on  shared  memories.  The 
reason  for  this  approach  is  to  abstract  out  inherent  limi¬ 
tations  of  electronic  technology  (such  as  limited  intercon¬ 
nection  capability);  in  designing  an  architecture  one 
would  adapt  the  abstract  model  to  the  limitations  of 
optical  systems.  These  shared  memory  models  are  basi¬ 
cally  a  parallelization  of  the  Random  Access  Machine. 
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The  Random  Access  Machine  (RAM)  model  [Aho, 
Hopcroft,  and  Ullman.  197-4)  is  a  model  of  sequential 
computation,  similar  to  but  less  primitive  than  the  Tur¬ 
in;  machine.  The  RAN  I  model  is  a  one-accumulator 
computer  in  which  the  instructions  are  not  allowed  to 
modify  themselves.  A  RAM  consists  of  a  read-only  input 
tape,  a  write-only  output  tape,  a  program  and  a 
memory.  The  time  on  the  RAM  is  bounded  above  by  a 
polynomial  function  of  time  on  the  TM.  The  program  of 
a  RAM  is  not  stored  in  memory  and  is  unmodifiable. 
The  RAM  instruction  set  is  is  small  and  consists  of 
operations  such  as  store,  add,  subtract,  and  jump  if 
greater  than  zero;  indirect  addresses  are  permitted.  A 
common  RAM  model  is  the  uniform  cost  one,  which 
assumes  that  each  RAM  instruction  requires  one  unit  of 
time  and  each  register  one  unit  of  space. 

Shared  memory'  models  are  based  on  global 
memories  and  are  differentiated  by  their  accessibility  to 
memory,  [n  Fig.  2  we  see  a  typical  shared  memory 
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Fig.  2.  Conceptual  diagram  of  shared  memory  models. 

model  where  individual  processing  elements  (PE’s)  have 
variable  simultaneous  access  to  an  individual  memory 
cell.  Each  PE  can  access  any  cell  of  the  global  memory 
in  unit  time.  In  addition,  many  PE’s  can  access  many 
different  cells  of  the  global  memory  simultaneously.  In 
the  models  we  discuss,  each  PE  is  a  slightly  modified 
RAM  without  the  input  and  output  tapes,  and  with  a 
modified  instruction  set  to  permit  access  to  the  global 
memory.  A  separate  input  for  the  machine  is  provided. 
A  given  processor  can  generally  not  access  the  local 
memory  of  other  processors. 

The  various  shared  memory  models  differ  primarily 
in  whether  they  allow  simultaneous  reads  and/or  writes 
to  the  same  memory  cell.  The  PRAC,  parallel  random 
access  computer  (Lev,  Pippenger  and  Valiant,  1981)  does 
not  allow  simultaneous  reading  or  writing  to  an  indivi¬ 
dual  memory  cell.  The  PRAM,  parallel  random  access 
machine,  (Fortune  and  Wy Hie,  1978)  permits  simultane¬ 
ous  reads  but  not  simultaneous  writes  to  an  individual 
memory  cell.  The  WRAM,  parallel  write  random  access 
machine,  denotes  a  variety  of  models  that  permit  simul¬ 
taneous  reads  and  certain  writes,  but  differ  in  how  the 


write  conflicts  are  resolved.  For  example,  a  model  by 
Shiloach  and  Vishkin  (1981)  allows  a  simultaneous  write 
only  if  all  processors  are  trying  to  write  the  same  value. 
The  paracomputer  (Schwartz.  1980)  has  simultaneous 
writes  but  only  "some"  of  all  the  information  written  to 
the  cell  is  recorded.  The  models  represent  a  hierarchy  of 
time  complexity  given  by 

Y  PRAC  >  y  PRAM  >  y  'VRAM 

where  T  is  the  minimum  number  of  parallel  time  steps 
required  to  execute  an  algorithm  on  each  model.  More 
detailed  comparisons  are  dependent  on  the  algorithm 
(Borodin  and  Hopcroft,  1985). 

In  general,  none  of  these  shared  memory  are  physi¬ 
cally  realizable  because  of  actual  fan-in  limitations.  As 
an  electronic  example,  the  ultracomputer  (Schwartz, 
1980)  is  an  architectural  manifestation  of  the  paracom¬ 
puter  that  uses  a  hardwired  Omega  network  between  the 
PE’s  and  memories;  it  simulates  the  paracomputer 
within  a  time  penalty  of  O  (log2n  ). 

Optical  systems  could  in  principle  be  used  to  imple¬ 
ment  this  parallel  memory  read  capability.  As  a  simple 
example,  a  single  1-bit  memory  cell  can  be  represented 
by  one  pixel  of  a  1-I>  or  2-D  array;  the  bi.  could  be 
represented  by  the  state  (opaque  or  transparent)  oi  the 
memory  cell.  Many  optical  beams  can  simultaneously 
read  the  contents  of  this  memory  cell  without  contention 
(Fig.  3).  In  addition  to  this  an  interconnection  network 
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Fig.  3.  One  memory  cell  of  an  array,  showing  multiple 
optical  beams  providing  contention-free  read  access. 

is  needed  between  the  PE’s  and  the  memory,  that  can 
allow  any  PE  to  communicate  with  any  memory  cell, 
preferably  in  one  step,  and  with  no  contention.  A  regu¬ 
lar  crossbar  is  not  sufficient  for  this  becau  j  fan-in  to  a 
given  memory  cell  must  be  allowed.  Figure  4  shows  a 
conceptual  block  diagram  of  a  system  based  on  the 
PRAM  model;  here  the  memory  array  operates  in 
reflec'ion  instead  of  transmission.  The  fan-in  required  of 
the  interconnection  network  is  also  depicted  in  the 
figure. 
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Fig.  -1.  Block  diagram  of  an  optical  architecture  based  on 
parallel  RAM  models. 

Optical  systems  can  potentially  implement  crossbars 
that  also  allow  this  fan-in.  Several  optical  crossbar 
designs  discussed  in  (Sawchuk,  et  al.,  1086)  exhibit  fan-in 
capability.  An  example  is  the  optical  crossbar  shown 
schematically  in  Fig.  5.  The  1-D  array  on  the  left  could 
be  optical  sources  (LED’s  or  laser  diodes)  or  just  the 
location  of  optical  signals  entering  from  previous  com¬ 
ponents.  An  optical  system  spreads  the  light  from  each 
input  source  into  a  vertical  column  that  illuminates  the 
crossbar  mask.  Following  the  crossbar  mask,  a  set  of 
optics  collects  the  light  transmitted  by  each  row  of  the 
mask  onto  one  element  of  the  output  array.  The  states 
of  the  pixels  in  the  crossbar  mask  (transparent  or 


Fig.  5.  Example  of  an  optical  crossbar  interconnection 
network. 

opaque)  determine  the  state  of  the  crossbar  switch.  Mul¬ 
tiple  transparent  pixels  in  a  column  provide  fanout;  mul¬ 
tiple  transparent  pixels  in  a  row  provide  fan-in.  Many 
optical  reconfigurable  network  designs  are  possible,  and 
provide  tradeoffs  in  performance  parameters  such  as 
bandwidth,  reconfiguration  time,  maximum  number  of 
lines,  hardware  requirements,  etc.  Unfortunately,  most 
simple  optical  crossbars  will  be  limited  in  size  to  approxi¬ 
mately  256  x  256  (Sawchuk,  et  al.,  1986).  We  are 
currently  considering  variants  of  this  technique  to 
increase  the  number  of  elements.  Possibilities  include 
using  a  multistage  but  nonblocking  interconnection  net¬ 
work  (e.g.  Clos),  a  hierarchy  of  crossbars,  and/or  a 
memory  hierarchy. 


GATES 

Since  the  superposition  property  of  optics  only 
applies  in  linear  media,  it  cannot  in  general  be  used  for 
gates,  which  of  course  are  inherently  nonlinear.  How¬ 
ever,  for  important  special  cases  superposition  can  allow 
many  optical  gates  to  be  replaced  with  one  optical 
switch. 


Consider  again  the  situation  depicted  in  Fig.  3,  with 
the  aperture  being  used  as  a  switch  or  relay.  The  con¬ 
trol  beam  opens  or  closes  the  relay;  when  the  relay  is 
closed  (i.e..  aperture  is  transparent),  many  optical  signal 
beams  can  independently  pass  through  the  relay.  If  6 
represents  the  control  beam  and  a,  the  signal  beams, 
this  in  effect  computes  6  a,  or  6  a,  ,  depending  on 
which  state  of  6  closes  the  relay,  where  denotes  the 
AND  operation  (Fig.  6). 


Fig.  6.  One  optical  relay  or  superimposed  gate  versus 
individual  gates  with  a  common  input. 

Using  this  concept,  a  set  of  gates  with  a  common 
input  in  an  SIMD  machine  can  be  replaced  with  one  opt¬ 
ical  switch  or  “superimposed  gate”.  It  also  obviates  the 
need  for  broadcasting  the  instructions  to  all  PE's; 
instead,  a  fan-in  of  all  signals  to  a  common  control 
switch  is  performed. 

These  superimposed  gates  are  not  true  3-terminal 
devices.  The  common  (6  )  input  is  regenerated,  but  the 
a,  inputs  are  not.  As  a  result,  a  design  constraint,  that 
these  a,-  signals  do  not  go  through  too  many  superim¬ 
posed  gates  in  succession  without  being  regenerated  by  a 
conventional  gate,  must  be  adhered  to.  Another  conse¬ 
quence  is  that  the  total  switching  energy  required  for  a 
given  processing  operation  is  reduced,  because  N  gates 
are  replaced  with  one  superimposed  gate.  This  is 

important  because  it  is  likely  that  the  total  switching 
energy  will  ultimately  be  the  major  limiting  factor  on 
the  switching  speed  and  number  of  gates  in  an  optical 
computer.  Other  advantages  include  an  increase  in  com¬ 
puting  speed  since  some  of  the  gates  are  effectively  pas¬ 
sive  and  reduced  requirements  on  the  device  used  to 
implement  the  optical  gates. 

CONCLUSIONS 

We  have  shown  that  the  property  of  superposition 
can  be  exploited  in  the  design  of  optical  or  hybrid 
optical/electronic  computing  architectures.  It  can  reduce 
the  hologram  complexity  for  highly  parallel  interconnec¬ 
tions,  reduce  the  number  of  gates  in  a  SIMD  system,  and 
permit  simultaneous  memory  access  in  a  parallel  shared 


memory  mLchine,  thereby  reducing  contention  problems. 
Our  fundamental  premise  in  studying  this  is  that  archi¬ 
tectures  for  optical  computing  must  be  designed  for  the 
capabilities  and  limitations  of  optics;  they  must  not  be 
constrained  by  the  limitations  of  electronic  systems, 
which  have  necessarily  dominated  approaches  to  digital 
parallel  computing  architectures  to  date. 
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